15
Bi-view semi-supervised active learning for cross-lingual sentiment classification Mohammad Sadegh Hajmohammadi, Roliana Ibrahim , Ali Selamat Software Engineering Research Group, Faculty of Computing, Universiti Teknologi Malaysia, 81300 UTM Skudai, Johor, Malaysia article info Article history: Received 7 June 2013 Received in revised form 14 March 2014 Accepted 17 March 2014 Available online xxxx Keywords: Cross-lingual Sentiment classification Co-training Active learning Density measure abstract Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually- labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction Text sentiment classification refers to the task of determining the sentiment polarity (e.g. positive or negative) of a given text document (Liu, 2012). Recently, sentiment classification has received considerable attention in the natural language processing research community due to its many useful applications such as online product review classification (Kang, Yoo, & Han, 2012) and opinion summarization (Ku, Liang, & Chen, 2006). Up until now, different methods have been used for sentiment classification. These methods can be categorised into two groups, namely; unsupervised and supervised. The unsupervised methods classify text documents based on the polarity of words and phrases contained in the text. If a text document contains more positive than negative terms, for example, it is http://dx.doi.org/10.1016/j.ipm.2014.03.005 0306-4573/Ó 2014 Elsevier Ltd. All rights reserved. Corresponding author. Tel.: +60 7 5538727. E-mail address: [email protected] (R. Ibrahim). Information Processing and Management xxx (2014) xxx–xxx Contents lists available at ScienceDirect Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas- sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Bi-view semi-supervised active learning for cross-lingual sentiment classification

  • Upload
    ali

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bi-view semi-supervised active learning for cross-lingual sentiment classification

Information Processing and Management xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Information Processing and Management

journal homepage: www.elsevier .com/ locate/ infoproman

Bi-view semi-supervised active learning for cross-lingualsentiment classification

http://dx.doi.org/10.1016/j.ipm.2014.03.0050306-4573/� 2014 Elsevier Ltd. All rights reserved.

⇑ Corresponding author. Tel.: +60 7 5538727.E-mail address: [email protected] (R. Ibrahim).

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentimesification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Mohammad Sadegh Hajmohammadi, Roliana Ibrahim ⇑, Ali SelamatSoftware Engineering Research Group, Faculty of Computing, Universiti Teknologi Malaysia, 81300 UTM Skudai, Johor, Malaysia

a r t i c l e i n f o a b s t r a c t

Article history:Received 7 June 2013Received in revised form 14 March 2014Accepted 17 March 2014Available online xxxx

Keywords:Cross-lingualSentiment classificationCo-trainingActive learningDensity measure

Recently, sentiment classification has received considerable attention within the naturallanguage processing research community. However, since most recent works regardingsentiment classification have been done in the English language, there are accordinglynot enough sentiment resources in other languages. Manual construction of reliablesentiment resources is a very difficult and time-consuming task. Cross-lingual sentimentclassification aims to utilize annotated sentiment resources in one language (typicallyEnglish) for sentiment classification of text documents in another language. Most existingresearch works rely on automatic machine translation services to directly projectinformation from one language to another. However, different term distribution betweenoriginal and translated text documents and translation errors are two main problems facedin the case of using only machine translation. To overcome these problems, we propose anovel learning model based on active learning and semi-supervised co-training toincorporate unlabelled data from the target language into the learning process in a bi-viewframework. This model attempts to enrich training data by adding the most confidentautomatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model,we consider the density of unlabelled data so as to select more representative unlabelledexamples in order to avoid outlier selection in active learning. The proposed model wasapplied to book review datasets in three different languages. Experiments showed thatour model can effectively improve the cross-lingual sentiment classification performanceand reduce labelling efforts in comparison with some baseline methods.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Text sentiment classification refers to the task of determining the sentiment polarity (e.g. positive or negative) of a giventext document (Liu, 2012). Recently, sentiment classification has received considerable attention in the natural languageprocessing research community due to its many useful applications such as online product review classification (Kang,Yoo, & Han, 2012) and opinion summarization (Ku, Liang, & Chen, 2006).

Up until now, different methods have been used for sentiment classification. These methods can be categorised into twogroups, namely; unsupervised and supervised. The unsupervised methods classify text documents based on the polarity ofwords and phrases contained in the text. If a text document contains more positive than negative terms, for example, it is

nt clas-

Page 2: Bi-view semi-supervised active learning for cross-lingual sentiment classification

2 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

classified as positive and vice versa (Taboada, Brooke, Tofiloski, Voll, & Stede, 2011; Turney, 2002). A sentiment lexicon isalways used to determine the sentiment polarity of each term. In contrast, supervised methods train a sentiment classifierbased on labelled data using some machine learning classification algorithms (Pang, Lee, & Vaithyanathan, 2002; Ye, Zhang,& Law, 2009). The performance of these methods specifically depends on the quality of labelled data as a training set for thesentiment classifier.

Based on these two groups of methods, sentiment lexicons and annotated sentiment data can be seen as the most impor-tant resources for sentiment classification. However, since most recent research studies in sentiment classification have beenpresented in the English language, there are not enough labelled corpus and sentiment lexicons in other languages(Montoyo, Martínez-Barco, & Balahur, 2012). Further, manual construction of reliable sentiment resources is a very difficultand time-consuming task. Therefore, the challenge is how to utilize labelled sentiment resources in one language (aresource-rich language such as English is always called the source language) for sentiment classification in another language(a resource-scarce language is called the target language). This subsequently leads to an interesting area of research calledcross-lingual sentiment classification (CLSC). Most existing research works employ automatic machine translation engines todirectly project information of labelled data from the source language into the target language (Balahur & Turchi, 2014;Banea, Mihalcea, & Wiebe, 2010). In this case, a sentiment classifier is trained based on the translated labelled data and thenapplied to the original test data for the classification task in the target language. Machine translation can be employed in theopposite direction by translating test documents from the target language into the source language (Martín-Valdivia,Martínez-Cámara, Perea-Ortega, & Ureña-López, 2013; Prettenhofer & Stein, 2011). In this situation, the sentiment classifieris trained based on the original labelled data in the source language and then applied to the translated test data. However,the use of only translated data in the sentiment classification task results in two main problems. The first problem is thedifference in term distribution between the original and the translated text documents due to the dissimilarity in cultures,writing styles and also linguistic expressions in the various languages. This subsequently leads to the creation of differentfeature distributions in the training and test data. The second problem relates to machine translation errors in the resourcetranslation process. However, since machine translation quality is still far from satisfactory, there are some translation errorswhich occur in the resource projection process. To overcome the first problem, making use of unlabelled data from the targetlanguage can be helpful, because this type of data is always easy to obtain and has the same term distribution as the testdata. Therefore, employing unlabelled data from the target language in the learning process is expected to result in a betterclassification in CLSC.

Active learning (AL) (Wang, Kwong, & Chen, 2012) and semi-supervised learning (SSL) (Ortigosa-Hernández et al., 2012)are two well-known techniques that make use of unlabelled data to improve classification performance. Both techniques areiterative processes. AL aims to reduce manual labelling efforts by finding the most informative examples for human labelling,while SSL tries to automatically label examples from unlabelled data in each cycle.

To reduce the effect of machine translation errors in the classification process, both directions of machine translation canbe simultaneously employed. Therefore, we have training and test documents in both languages (original version of trainingdocuments and translated version of test documents in the source language and translated version of training documentsand original version of test documents in the target language). If the translated version of a document has some translationerror on one side, the original version of that document is used on the other side.

Given the two possible directions for data translation, we can consider sentiment data from two different views, namely;source language view and target language view. In this paper, we consider source language features and target language fea-tures as being two sufficient feature representations of labelled and unlabelled data. Accordingly, we propose a new modelbased on a combination of bi-view Active learning, Co-testing (Muslea, Minton, & Knoblock, 2006), and semi-supervised co-training (Park & Zhang, 2004) in order to incorporate unlabelled data from the target language into the learning process. Co-testing is a bi-view active learning process that aims to find the most informative unlabelled examples by considering thedisagreement between two classifiers trained in each view. The intuitive theory behind co-testing is that if the classifierstrained in each view classify an unlabelled example differently, at least one classifier makes a mistake on its prediction.Therefore, this unlabelled example can provide useful information for the classifier with an incorrect prediction. On the otherhand, Co-training tries to select the most confidently-classified examples from unlabelled data in each view so as to add tothe training data in the other view. These two techniques complement each other in order to reduce human labelling efforts.

From experimental results in co-testing, it can be seen that some of the selected contention examples cannot providemuch help to the learner. The main reason for this issue is that some of these selected examples are outliers and thereforeare not representatives. To avoid outlier selection in co-testing, we considered the density of selected examples in our pro-posed method and chose those contention examples that have maximum average density in the pool of unlabelled data.

The contributions of our work are as follows: (1) parallel combining of two bi-view approaches, co-training andco-testing, in order to incorporate unlabelled examples from the target language in the learning process of cross-lingualsentiment classification. This is achieved by selecting the most confident automatically-labelled examples, as well as afew of the most informative manually-labelled examples from unlabelled data. Specifically, the contribution degree wasdefined as the criteria of select informative instances. These select the contention examples which have a different predictedlabel between the source and target views of unlabelled documents. (2) When selecting the most informative unlabelledexamples by co-testing, we propose to use density information of unlabelled examples in order to select not only the mostinformative examples, but also the most representative examples for manual labelling. Specifically, we select the contention

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 3: Bi-view semi-supervised active learning for cross-lingual sentiment classification

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 3

examples which have a different predicted label in two views and have maximum average similarity with the other unla-belled examples in the unlabelled pool.

The proposed method was applied to book review datasets in three different languages and experiments showed that ourmethod can effectively reduce the manual labelling effort while, at the same time, increasing the performance of cross-lin-gual sentiment classification in comparison with some baseline methods.

The reminder of this paper is organized as follows: The next section presents related works on sentiment classificationand CLSC. Section 3 describes bi-view data creation. The proposed model is described in Section 4, while an evaluation isgiven in Section 5. Finally, Section 6 concludes this paper and outlines ideas for future research.

2. Related works

2.1. Sentiment classification

Sentiment classification can be performed on words, sentences or entire documents. Sentiment classification methods arecategorised into two groups, namely; unsupervised and supervised. In unsupervised methods, the semantic orientation ofphrases are calculated at the first stage and then the sentiment orientation of a document is predicted based on averagesemantic orientation of all phrases contained in it (Harb et al., 2008; Turney, 2002). A research study by Turney (2002) pre-sented a simple unsupervised learning algorithm for classifying a review as being either recommended or not recommended.He determined whether words are positive or negative, as well as the strength of the evaluation by computing the words’point wise mutual information (PMI) for their co-occurrence with a positive seed word (‘‘excellent’’) and a negative seedword (‘‘poor’’). He called this value the word’s semantic orientation. This method performed a scan through a review lookingfor phrases that match certain parts of speech patterns (adjectives and adverbs). It then computed the semantic orientationof those phrases and finally added up the semantic orientation of all of those phrases in order to compute the orientation of areview document. In contrast, supervised methods (Pang et al., 2002; Ye et al., 2009) treat the sentiment classification as aconventional classification task and try to build a classifier based on a set of labelled documents (or sentences). In theirstudy, Pang et al., 2002 compared the performance of three traditional supervised classification approaches (SVM, NaïveBayes and Maximum Entropy) and showed that these techniques outperformed human-generated baselines. A study byYe et al. (2009) incorporated the sentiment classification techniques into the domain of destination reviews. They used threesupervised learning algorithms consisting of support vector machine, NB and the character based N-gram model in order toclassify destination reviews. In addition, they used the frequency of words to represent a document instead of word pres-ence. They found that SVM outperforms the other two classifiers.

2.2. Cross-lingual sentiment classification

Cross-lingual sentiment classification has been extensively studied in recent years. These research studies are based onthe use of annotated data in the source language (always English) to compensate for the lack of labelled data in the targetlanguage. Most approaches focus on resource adaptation from one language to another with few sentiment resources. Forexample, Mihalcea, Banea, and Wiebe (2007) generate subjectivity analysis resources into a new language from English sen-timent resources by using a bilingual dictionary. In other works (Banea, Mihalcea, Wiebe, & Hassan, 2008; Banea et al., 2010),automatic machine translation engines were used to translate the English resources for subjectivity analysis. In a furtherstudy (Banea et al., 2008), the authors showed that automatic machine translation is a viable alternative for the constructionof resources for subjectivity analysis in a new language. In two different experiments, they first translated training data ofsubjectivity classification from the source language into the target language. They then utilized this translated data to train aclassifier in the target language and subsequently applied this trained classifier to classify test data. Additionally, in anotherexperiment, machine translation was used to translate test data from the target language into the source language and aclassifier was then trained based on training data in the source language. After the training phase, the translated test datawas presented to the classifier for sentiment polarity prediction. Wan (2008) used unsupervised sentiment polarity classi-fication in Chinese product reviews. He translated Chinese reviews into English using a variety of machine translationengines and then performed sentiment analysis for both Chinese and English reviews using a lexicon-based technique.Finally, he used ensemble methods by which to combine the analysis results. This method requires a sentiment lexicon inthe target language and cannot be applied to other languages with no lexicon resource. Pan, Xue, Yu, and Wang (2011)designed a bi-view non-negative matrix tri-factorization (BNMTF) model to solve the problem of cross-lingual sentimentclassification. They used the machine translation to achieve two representations of training and test data; one in the sourcelanguage and the other in the target language. This model was then used to combine the information from two views.Another approach is that of feature translation, which involves translating the features extracted from labelled documents(Moh & Zhang, 2012; Shi, Mihalcea, & Tian, 2010). The features, selected by feature selection algorithm, are then translatedinto different languages. Subsequently, based on those translated features, a new model is trained for each language. Thisapproach only needs a bilingual dictionary to translate the selected features. It can, however, suffer from the inaccuraciesof dictionary translation, in that words may have different meanings in different contexts. Therefore, selecting the featuresto be translated can be an intricate process. In another work, Wan (2009, 2011) used the co-training algorithm to overcome

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 4: Bi-view semi-supervised active learning for cross-lingual sentiment classification

4 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

the problem of CLSC. He first investigated basic methods for CLSC by using machine translation services. He then exploited abilingual co-training approach by which to leverage annotated English resources to sentiment classification in Chinesereviews. In this work, firstly, machine translation services were used to translate English labelled documents (training doc-uments) into Chinese and similarly, Chinese unlabelled documents into English. The author used two different views (Englishand Chinese) in order to exploit the co-training approach into the classification problem. The co-training process usuallyselects high confidence examples to add to the training data. However, if the initial classifiers in each view are not goodenough, there will be an increased probability of adding examples having incorrect labels to the training set. Therefore,the addition of noisy examples not only cannot increase the accuracy of the learning model, but will also gradually decreasethe performance of each classifier.

In contrast to the results from this study (Wan, 2009, 2011), which only utilized co-training to enrich a training set, wepropose to combine active learning and co-training in order to enrich the initial training set. This will be achieved throughselecting and adding incrementally as few manually-labelled examples as possible, along with some automatically-labelledexamples from the unlabelled pool. To select unlabelled examples for manual labelling, we exploit the density information ofunlabelled examples in order to select not only the most informative examples, but also the most representative ones.

3. Bi-view data creation

Document translation can be performed to project textual information from one language into another. In CLSC, two dif-ferent directions are possible for data projection; one from the source language into the target language and another fromthe target language into the source language. Given these two possible directions for document translation, we can look atsentiment data from two different views, namely; source view and target view. The source view consists of original labelleddocuments and translated versions of unlabelled documents. On the other hand, translated labelled documents and originalunlabelled documents represent the target view. Considering these two views, we have labelled and unlabelled data occur-ring simultaneously in a bi-view framework which we refer to as ‘‘bi-view data’’. Fig. 1 diagrammatically shows this bi-viewdata creation process.

4. The proposed model

As mentioned in the first section, because term distribution in the original and translated versions of text documents isdifferent, the performance effectiveness of cross-lingual sentiment classifier is limited. To increase this performance, makinguse of unlabelled data from the target language can be helpful since these data have the same term distribution as testdocuments in the respective target language. However, manually labelling unlabelled data is a very difficult and time-consuming task.

In an attempt to reduce the labelling effort, we propose a new model based on bi-view data. This model attemptsto enrich initial training data in cross-lingual sentiment classification through manual (AL) and automatic (Co-training)labelling of some unlabelled data from the target language. The framework of the proposed model is illustrated inFig. 2.

In the learning phase, after creation of bi-view data, two classifiers are trained based on the initial training data in each ofthe views and then applied to the unlabelled data in the corresponding view. From the newly classified unlabelled data, ALthen selects a few of the most informative and representative examples based on the view disagreement and density analysisfor human labelling. Simultaneously, co-training selects some of the more confident classified examples with correspondingpredicted labels. These two groups of selected examples are added to the training data for the next learning cycle. At the next

Labelled documents Machine

translation

Unlabelled documents

Labelled documents

Unlabelled documentsMachine

translation

Source Language Target Language

Bi-v

iew

la

belle

d da

taB

i-vie

w

unla

belle

d da

ta

Fig. 1. Two different views of data by using machine translation.

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 5: Bi-view semi-supervised active learning for cross-lingual sentiment classification

Training data in the target view

Translated labelled documents in the

target view

Translated unlabeled data in the source view

Classifier h2

Classifier h1

Training data in the source view

Original labelled documents in the

source view

Original unlabeled data in the target

view

Most informative examples

Most confident examples

Most confident examples

Expert

Human labeling

Density Measure

(a) Learning phase

Translated test documents in the source language

Original test documents in the target language

Machine translation

Classifier h2

Classifier h1

Combine

(b) Test phase

Fig. 2. Framework of the proposed approach.

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 5

cycle, the model is retrained based on the augmented training data and this process is repeated until a termination conditionis satisfied. In the test phase, the two trained classifiers are applied to the original and translated versions of test data in cor-responding views for the classification task. The final prediction for a given test example is computed based on the average ofthe prediction values of the two classifiers. AL and Co-training are described in detail in the following sections.

4.1. Active learning process

Active learning is a subcategory of machine learning (Cohn, Atlas, & Ladner, 1994). The main goal of active learning is tolearn a classifier by labelling as few examples as possible when actively selecting the examples to be labelled in the learningprocess. As reported in previous research works (Cheng & Wang, 2007; Jingbo, Huizhen, Tsou, & Ma, 2010; Li, Ju, Zhou, & Li,

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 6: Bi-view semi-supervised active learning for cross-lingual sentiment classification

6 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

2012; Zhu, Wang, Yao, & Tsou, 2008), active learning is a promising method by which to speed up data labelling whileminimizing human labelling efforts. An active learner can be modelled as a quintuple (C, Q, O, L, U) (M. Li & Sethi, 2006).Initially, the classifier C is trained based on labelled examples set L and applied to the unlabelled pool U. Following this, aset of the most informative examples is then selected from the unlabelled pool U using a query function Q and a true classlabel is assigned to each of them by an oracle O. After that, these new labelled examples are augmented to L and the classifierC is retrained based on this upgraded training set. This sequential label requesting process continues for some predefinediteration or until a termination condition is satisfied. The algorithm is shown as follows:

Algorithm 1: Active learning processGiven

L = L0: the initial labelled training setU: the pool of unlabelled examplesQ: the query function

– Train a classifier C using initial training set L.– Loop until the termination condition is satisfied.

– Use classifier C to predict the class label of each example in U.– Query a set of examples from the unlabelled examples pool U based on the query function Q.– Assign a true class label to each of the selected examples by an oracle (O).– Add the new labelled examples into the training set (L) and remove them from the unlabelled pool.– Retrain the classifier C based on the new training set.

The query function is essential during the active learning process. The major difference between several previously-proposed active learning methods is due to the disparity in their query function. The simplest query function is that ofuncertainty sampling (Lewis & Gale, 1994), in which unlabelled examples with maximum uncertainty are selected formanual labelling in each learning cycle. Maximum uncertainty means that the learner has less confidence in its classifi-cation of these unlabelled examples. Another sample selection strategy is committee-based sampling, where the activelearner selects those unlabelled examples which have the largest disagreement among several committee classifiers.Besides the query by committee (QBC) (Freund, Seung, Shamir, & Tishby, 1997) as the first of such type, co-testing exam-ines a committee of member classifiers from different viewpoints and selects those contention examples (i.e., unlabelledexamples on which the views predict different labels) for manual labelling (Muslea et al., 2006). The reasoning behindco-testing is that if the classifiers trained in each view classify an unlabelled example differently, at least one classifiermakes a mistake in its classification. Therefore, this unlabelled example can provide useful information for the classifierwith incorrect prediction. In our proposed model, we use the co-testing approach because it can be applied in the bi-viewframework of our problem.

4.1.1. Density analysisAs reported in (Jingbo et al., 2010; Roy & McCallum, 2001; Tang, Luo, & Roukos, 2002), many unlabelled examples selected

by the query function cannot help the learner since they are outliers. This means that a good selected example for manuallabelling in active learning should not only be the most informative example, but also the most representative one. There-fore, adding outlier examples to the training data cannot provide much help to the learner. To solve this problem, Tang et al.(2002) proposed a sampling method in which the most uncertain example is selected by a learner from each cluster and thenweighted using a density measure. In another work Jingbo et al. (2010) proposed a density-based re-ranking technique toselect the most informative and representative example from unlabelled data to solve the outlier problem in selective sam-pling. They introduced a density concept by which to determine whether an unlabelled example is an outlier or not. To deter-mine the density degree of an unlabelled example, they used a novel method called k-nearest neighbour based density (kNNdensity). In this measure, the density degree of an unlabelled example is computed by evaluation of average similaritybetween this example and k most similar unlabelled examples in the unlabelled pool. Suppose S(x) = {s1, s2,. . . ,sk} is a setof k most similar unlabelled examples to the x. Therefore average similarity for x (AS(x)) can be computed based on the fol-lowing formula:

Pleasesificat

ASðxÞ ¼P

si2SðxÞsimilarityðx; siÞk

ð1Þ

This density measure was then used in combination with an uncertainty measure to select the most representative andinformative example from among the unlabelled pool for human labelling. Although the outlier problem has been consideredwith uncertainty sampling in previous studies, this problem may occur in query by committee (QBC) sampling, especially inco-testing and sentiment analysis. Therefore we employ the density degree introduced in a previous study (Jingbo et al.,2010) to avoid selecting outlier examples in each cycle of co-testing. As mentioned in Section 3, each unlabelled examplehas two different representations based on the features of both source and target languages. Therefore, the density of eachunlabelled example can be computed in two different views, the source view and the target view respectively. Average

cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-ion. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 7: Bi-view semi-supervised active learning for cross-lingual sentiment classification

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 7

similarity of an unlabelled example in the source (or the target) view shows the density value of that example based on thesource (or the target) language features. Final density value of an unlabelled example is computed by averaging these twodifferent density values.

We refer to this new active learning model as density-based active learning. In this model, after determining thecontention examples, the algorithm chooses those contention examples that have maximum density value as being themost informative and representative unlabelled examples. They were then labelled by an oracle and added to training data.We use cosine measure as the similarity function with which to compute the pair-wise similarity value between twoexamples.

4.2. The co-training algorithm

The co-training algorithm (Blum & Mitchell, 1998) is one of the most successful semi-supervised methods of handlingunlabelled data. The standard co-training algorithm assumes that two different and sufficient sets of features or views existthat adequately describe the data. Two separate classifiers are trained over each of the views by incorporating an initial set oflabelled data. The most confident classified examples are then recovered from unlabelled data as the new labelled data foreach other. These newly-recovered examples are combined with previous labelled data to create a new training set for thenext round. This iterative process continues for some predefined iteration. The final prediction for a given example is thecombination of predictions of two classifiers. The pseudo-code of co-training algorithm for binary classification has beenshown in Algorithm 2 as follows:

Algorithm 2: Co-training algorithmGiven:

L = L0: the initial labelled training setU: the pool of unlabelled examplesV1 and V2: two different views of data

Initial parameters: p: number of most confident positive examples selected by each view in each cyclen: number of most confident negative examples selected by each view in each cycle

– Train classifier h1 on view V1 of L.– Train classifier h2 on view V2 of L.– Loop for predefined number of iterations

– Use h1 to determine class labels of examples in U based on view V1.– Use h2 to determine class labels of examples in U based on view V2.– Select p positive and n negative examples labelled with maximum confidence by h1.– Select p positive and n negative examples labelled with maximum confidence by h2.– Add the selected examples with corresponding predicted labels to L and remove them from U.– Retrain classifier h1 on view V1 of new L.– Retrain classifier h2 on view V2 of new L.

Two separate views, namely V1 and V2 are used during the learning process and the classifiers (h1 and h2) are iterativelyretrained over each view for a predefined number of iterations. In each cycle, the classifiers select p positive and n negativemost confident newly-labelled examples from the unlabelled pool to add them to the set of labelled training set (L). Afterthat, the two classifiers are retrained based on this augmented set of labelled examples for the next cycle.

4.3. Density based Active Co-Training (DACT) model

After the creation of bi-view data, we then have training, test and unlabelled documents in two separate views. These twoviews are used in the co-training and active learning components of the proposed model in parallel. The co-training algo-rithm can be employed to label the most confident documents in each view and generate new training examples for anotherview. In each view, a classifier is trained based on training data and applied to the unlabelled examples directly. Since textdocuments with the same sentiment polarity in two languages may share some common terms and characteristics, theseclassifiers may be able to correctly predict the sentiment polarity of some unlabelled documents by using the classificationinformation transferred from another language. These most confident classified examples are selected and added to thetraining set with corresponding predicted labels for the next step (automatic labelling). However, in some other documents,classifiers may be unable to distinguish the correct sentiment polarity, since the information for predicting the polarity ofthese documents cannot be detected. These misclassified documents are generally very informative toward the learning pro-cess. We exploit a bi-view active learning approach, Co-testing, in conjunction with co-training in order to find and manuallylabel the most informative examples from unlabelled data. The pseudo-code of the proposed method has been shown inAlgorithm 3 as follows:

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 8: Bi-view semi-supervised active learning for cross-lingual sentiment classification

8 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

Algorithm 3: Proposed algorithmGiven:

L = L0: the initial labelled training set from the source languageU: the pool of unlabelled examples from the target languageV1 and V2: two different views of data (source view and target view)

Initial parameters: p: number of most confident positive examples selected by each view in each cyclen: number of most confident negative examples selected by each view in each cyclet: number of most informative examples selected by co-testing in each cyclek: number of nearest neighbour examples used in calculating the average similarity

– Calculate average similarity values of each example in U based on the equation (1) in source and target views andaverage them as final average similarity.

– Train classifier h1 on view V1 of L.– Train classifier h2 on view V2 of L.– Loop for predefined number of iterations

– Use h1 to determine class labels of examples in U based on V1.– Use h2 to determine class labels of examples in U based on V2.– Select p positive and n negative examples that are labelled with most confidence by h1.– Select p positive and n negative examples that are labelled with most confidence by h2.– Add all selected examples with corresponding predicted labels to L and remove them from U.– Create contention examples set (Cp) by selecting those unlabelled examples on which h1 and h2 predict different

labels (based on equation (2))– Select t examples from Cp that have maximum average similarity.– Assign a true class label to each of t selected examples by an oracle.– Add the selected examples to L and remove them from U.– Retrain classifier h1 on view V1 of new L.– Retrain classifier h2 on view V2 of new L.

This algorithm starts with a set of labelled examples L from the source language as the initial training set and a set ofunlabelled examples U from the target language. Each example (labelled and unlabelled) has two different views, specifi-cally: V1 (source view) and V2 (target view). Two separate classifiers, namely, h1 and h2 are trained based on two differentviews of the training set. It means that classifier h1 is trained based on the training data represented by the source languagefeature set (V1 view) while h2 is trained based on the training data represented by the target language feature set (V2 view).These two classifiers are then applied to the unlabelled pool to predict the class label of examples in U. Confidence ratings foreach newly classified example are separately computed in each classifier based on the distance of each example from thecurrent decision boundary. The further each example lies from the decision boundary, the more confident it is in predictingthe label of that example. Further, in each of the views, namely; p positive and n negative, the most confident examples areselected as auto-labelled examples to be added to the training data, along with the corresponding predicted label for the nextiteration. Furthermore, a set of contention examples is created by selecting those unlabelled examples that have differentpredicted labels in the source and the target views respectively. The contention examples set (Cp) can be defined as follows:

Pleasesificat

Cp ¼ fd 2 U;h1ðdÞ–h2ðdÞg ð2Þ

These examples are considered as informative examples for active learning because they can provide useful informationfor at least one of the two classifiers that have incorrect predictions. The t examples that have maximum average similaritywith the other examples in the unlabelled pool are selected from contention examples set for manual labelling. These twogroups of newly labelled examples (both automatically labelled and manually labelled) are then added to the training dataand removed from the unlabelled pool. The two classifiers are retrained based on the updated training set. This process isrepeated for a predefined number of iterations.

5. Evaluation

In this section, we evaluate our proposed model via cross-lingual sentiment classification on three different languages inthe book review domain and compare it with some baseline methods. The sensitivity of our proposed method to the numberof nearest neighbour parameter (k) in density measure is also considered.

5.1. Datasets

We have selected book review documents from two different cross-lingual sentiment datasets. The first dataset was usedby Prettenhofer and Stein (2011), and contains Amazon product reviews for three different domains consisting of books,DVDs and music. These are in four different languages, specifically: English, French, German and Japanese. Each review

cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-ion. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 9: Bi-view semi-supervised active learning for cross-lingual sentiment classification

Table 1Details of the datasets used in our experiments.

Dataset Domain Languages Total documents Positive documents Negative documents

En–Fr (Prettenhofer and Stein, 2011) Book review Source language English 2000 1000 1000Target language French 4000 2000 2000

En–Ch (Pan et al., 2011) Book review Source language English 2000 1000 1000Target language Chinese 4000 2000 2000

En–Jp (Prettenhofer and Stein, 2011) Book review Source language English 2000 1000 1000Target language Japanese 4000 2000 2000

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 9

document is labelled as being either positive or negative based on its sentiment polarity. We only selected book reviewsfrom this dataset and in only three languages, namely: English, French and Japanese. The book review dataset in the Englishlanguage contains 2000 (1000 positive and 1000 negative) documents considered as labelled examples. A total of 4000review documents (2000 positive and 2000 negative) were selected from the French and Japanese datasets and consideredas unlabelled examples. Another dataset used in this paper is the pan reviews dataset (Pan et al., 2011). This collection con-sists of three review datasets in different domains (Movie, Book and Music) in both English and Chinese. We selected onlybook reviews in Chinese from this collection. This dataset also contains 4000 book review documents (2000 positive and2000 negative) and is considered as unlabelled data in the Chinese language. By combining English review documents withdocuments in other languages, three different evaluation datasets for cross-lingual sentiment classification were subse-quently formed. We called them, namely: En–Fr (English–French), En–Ch (English–Chinese) and En–Jp (English–Japanese)respectively. The English language is used as the source language and other languages are considered as the target languages.In all datasets, all reviews in the source language are translated into target languages and similarly, all reviews in targetlanguages are translated into the source language using the Google Translate engine (http://translate.google.com/). Table 1shows the properties of these three evaluation datasets.

In the pre-processing step, all English reviews are converted into lowercase. Special symbols and other unnecessarycharacters are eliminated from each review document. In the Japanese text document, we applied MeCab1 segmentersoftware to segment the reviews; while Chinese documents were segmented by the Stanford Chinese word segmenter.2 Inthe feature extraction step, unigram and bi-gram patterns were extracted as sentimental patterns. To reduce computationalcomplexity, especially in density estimation, we performed feature selection using the information gain (IG) technique. Weselected 5000 high score unigrams and bi-grams as final features. Each document is represented by a feature vector, each entryof which contains a feature weight. We used term presence as feature weights since this method has been confirmed as themost effective feature-weighting method in sentiment classification (Pang et al., 2002; Xia, Zong, & Li, 2011).

5.2. Based lines methods

The following baseline methods are implemented in order to compare the effectiveness of the proposed model using thesame number of manually-labelled examples.

– Active Co-Training model (ACT): This model is similar to DACT but without consideration for the density measure of con-tention examples. In this model, after obtaining a contention example set based on classification results of two differentviews, t unlabelled examples are randomly selected from the contention examples set. This baseline model is used toevaluate the effectiveness of density analysis in the DACT model.

– Uncertainty sampling active learning model (US-AL): In this model we used well-known uncertainty sampling as a queryfunction in active learning. In this query function, examples that are closest to the decision boundary are consideredas uncertain examples and chosen for manual labelling. The average of the uncertainty values in the source and targetview is used as the final uncertainty degree of each unlabelled example. Following this, t unlabelled examples withmaximum uncertainty are then selected for manual labelling in each learning cycle.

– Co-testing: This model is based on the simple co-testing approach. In this model, t unlabelled examples from thecontention examples set are selected for manual labelling in each learning cycle. This baseline model is then used todetermine the effect of automatic labelling of unlabelled examples on the performance of the proposed model.

– Co-training: This model is based on the method used by (Wan, 2011). In order to compare this model with other activelearning-based models, we randomly selected some manually-labelled examples from the unlabelled pool andadded them to the initial training set before starting the learning process. The number of manually-labelled examplesadded to the initial training set is equal to the total number of examples selected by the query function of other activelearning-based models.

1 http://mecab.googlecode.com/svn/trunk/mecab/.2 http://nlp.standfor.edu/software/segmenter.

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 10: Bi-view semi-supervised active learning for cross-lingual sentiment classification

10 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

– Random sampling model (RS): To evaluate the performance of the query function in proposed models, we implemented apassive learner process, whereby t examples are randomly selected from the unlabelled pool for manual labelling in eachlearning cycle. Since the random sampling model generates high variance results due to the random selection strategy,we averaged the performance of random sampling over 10 runs using the same experimental setting.

5.3. Experimental setup

To estimate average similarity for each unlabelled example in our proposed model, we used the Jingbo method (Jingboet al., 2010). At the beginning of the algorithm, we calculated the pair wise similarity value between any two unlabelledexamples in the unlabelled pool in each view and stored these values in a matrix. During the learning process, in order tocalculate the average similarity for each example in the unlabelled pool, other examples were sorted based on their similar-ity scores. Therefore, the average similarity for each example can be calculated efficiently by averaging the similarity scoresof top-k examples in its sorted list.

In all experiments, we used the support vector machine classifier (SVM) (Joachims, 1999) as the basic classifier in eachview. SVMlight (http://svmlight.joachims.org/) is used as the SVM classifier in the experiments with all parameters beingset to their default values.

5.3.1. Cross-validation on active learning and semi-supervised learningTo generate reliable results, we performed a 3-fold cross validation on active learning and semi-supervised learning pro-

cesses. For this task, the unlabelled documents in the target language were randomly divided into three groups of equal size.During each step of the cross validation, two groups of documents were treated as the unlabelled pool and the evaluation ofthe performance was based on the remaining group as an independent test set. The final results are averaged over three iter-ations. Fig. 3 shows the data splitting configuration in each fold of cross-validation. To calculate the average similarity ofeach of the unlabelled examples, only the pair-wise similarity of those examples that are included in the unlabelled poolwere considered.

5.4. Performance measures

Generally, the performance of sentiment classification is evaluated by using four indexes, namely: Accuracy, Precision,Recall and F1-score. Accuracy is the portion of all true predicted instances against all predicted instances. An accuracy rateof 100% means that the predicted instances are exactly the same as the actual instances. Precision denotes the portion of truepredicted instances against all predicted instances for each class. Recall represents the portion of true predicted instancesagainst all actual instances for each class. F1 is a harmonic average of precision and recall.

5.5. Results and discussion

In this section, our proposed method is compared with five baseline methods. We set the parameter k in the kNN densitymeasure to 10 for all experiments. In the next subsection, we will study this parameter further. We also used p = n = 5 for theco-training algorithm as in (Wan, 2011) and selected 5 unlabelled examples (t = 5) in each cycle of co-testing for the manuallabelling. The total number of iterations was set to 20 iterations for all algorithms. This setting means that a total of 100 unla-belled examples were selected for manual labelling, while 400 unlabelled examples were labelled automatically during thelearning process. After each learning cycle, test data is presented into two learned classifiers and the final prediction for agiven test example is computed based on the average of prediction values of these classifiers.

Table 2 shows the comparison results after the full learning process. As we can see, DACT and ACT models show betterperformance in comparison with co-training and co-testing after the full learning process. This supports the idea thatcombining the co-testing and co-training processes can actually be better than each individual approach. This is most likelydue to the augmentation of the most confident automatic classified examples, along with manually-labelled examples, into

Unlabelled documents in the target language

labelled documents in the source language

Unlabelled pool (66.6%)

Test set (33.3%)

Training set

Fig. 3. Data splitting configuration for experiment.

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 11: Bi-view semi-supervised active learning for cross-lingual sentiment classification

Table 2Performance comparison in three datasets after the first 100 manually-labelled examples (best results are reported in bold-face).

Dataset Methods Accuracy Positive Negative

Precision Recall F1 Precision Recall F1

En–Fr DACT 82.17 82.00 81.94 81.97 81.81 81.86 81.83ACT 81.45 81.10 82.19 81.64 81.83 80.70 81.25US-AL 80.44 80.45 80.64 80.53 80.47 80.24 80.35Co-testing 80.79 78.78 82.84 80.73 81.81 77.47 79.55Co-training 80.17 79.83 81.79 80.76 81.21 79.08 80.09RS 78.99 76.38 84.19 80.09 82.27 73.76 77.77

En–Ch DACT 76.32 79.09 70.39 74.49 73.58 81.59 77.38ACT 75.34 78.59 69.27 73.63 72.81 81.34 76.84US-AL 73.48 76.96 66.64 71.32 71.00 80.24 75.28Co-testing 74.21 74.82 70.79 72.73 72.60 76.43 74.45Co-training 74.38 78.10 68.11 72.69 72.06 81.04 76.24RS 73.09 74.76 69.57 72.01 71.80 76.57 74.05

En–Jp DACT 74.24 74.50 72.99 73.71 73.57 74.99 74.25ACT 73.59 75.81 69.19 72.34 71.74 77.99 74.73US-AL 72.44 74.91 67.43 70.94 70.48 77.44 73.77Co-testing 72.50 74.66 66.88 70.48 70.17 77.34 73.53Co-training 72.40 74.06 70.84 72.40 72.08 75.19 73.60RS 71.06 72.89 67.28 69.80 69.79 74.82 72.10

Table 3The p-value of paired t-test that compares DACT model with baseline methods for each dataset.

Dataset Baseline methods

ACT US-AL Co-testing Co-training RS

En–Fr 1.73E�02 5.65E�05 3.88E�04 1.06E�06 3.01E�11En–Ch 1.54E�01 1.16E�02 5.34E�03 1.87E�02 3.52E�04En–Jp 2.65E�01 3.72E�03 1.04E�02 1.75E�02 7.69E�05

(a) En-Fr Dataset (b) En-Ch Dataset

(c) En-Jp Dataset

0 10 20 30 40 50 60 70 80 90 10077.5

7878.5

7979.5

8080.5

8181.5

8282.5

Number of manually labeled examples

Acc

urac

y (%

)

DACT ACT US-AL Co-Testing Co-Training RS

0 10 20 30 40 50 60 70 80 90 10070

71

72

73

74

75

76

77

Number of manually labeled examples

Acc

urac

y (%

)

DACT ACT US-AL Co-Testing Co-Training RS

0 10 20 30 40 50 60 70 80 90 10069.5

7070.5

7171.5

7272.5

7373.5

7474.5

Number of manually labeled examples

Acc

urac

y (%

)

DACT ACT US-AL Co-Testing Co-Training RS

Fig. 4. Average learning curves by 3-fold cross-validation for different methods on the three different languages.

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 11

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 12: Bi-view semi-supervised active learning for cross-lingual sentiment classification

(a) En-Fr Dataset (b) En-Ch Dataset

(c) En-Jp Dataset

1 2 3 4 5 6 7 8 9 10 1170

72

74

76

78

80

82

Number of manually labelled examples

Acc

urac

y (%

)

Bi-Views Source View Target View

1 2 3 4 5 6 7 8 9 10 1164

66

68

70

72

74

76

78

Number of manually labelled examples

Acc

urac

y (%

)

Bi-Views Source View Target View

1 2 3 4 5 6 7 8 9 10 1167

68

69

70

71

72

73

74

75

Number of manually labelled examples

Acc

urac

y (%

)

Bi-Views Source View Target View

Fig. 5. Average learning curves by 3-fold cross-validation for combined views and individual views on the three different languages.

12 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

training data during the learning process. Moreover, the DACT model outperforms ACT model in all datasets. This shows thatusing the density measure of unlabelled examples has a beneficial effect upon selecting the most representative examples formanual labelling. Also, when compared with uncertainty sampling (US-AL), co-testing shows slightly better accuracy in alldatasets. This would indicate that co-testing is more effective than uncertainty sampling in the bi-view CLSC. In the uncer-tainty sampling (US-AL), the uncertain examples which are selected for manual labelling may be classified correctly by bothclassifiers and therefore cannot be helpful for improving the classification accuracy of classifiers. However, in the co-testingstrategy, at least one classifier benefits from the selected examples in each learning step and the overall performance is sub-sequently improved step by step.

In comparison to the co-training method proposed by Wan (2011), the co-training process usually selects high confidenceexamples to add to the training data. However, if the initial classifiers in each view are not good enough, there will be anincreased probability of added examples providing incorrect labels to the training set. This ultimately leads to a decreasein the performance of classification. In our proposed method, the use of active learning results in an increase in the accuracyof classifiers in each view in the earlier steps of the learning process. Similarly, it will decrease the probability of adding noisyexamples to the training set. In addition, the most confidently classified examples selected by co-training are notnecessarily the most valuable examples for improving classification accuracy. Therefore, in our proposed method wecombine co-training with active learning in order to select the most valuable examples for improving the classificationaccuracy along with high confident examples. The main limitation of our method in comparison with that of co-trainingis the need for human effort to label the selected examples in the active learning part.

As we can see in Table 2, the classification accuracies vary for different languages. These differences can be said to haveoriginated from two facts: first, sentiment classification shows diverse performance in different languages due to the dispar-ity in the structure of languages when expressing the sentimental data. Next, automatic machine translation systems per-form translation of varying quality in different languages. Therefore, the performances of CLSC using automatic machinetranslation can be seen to be varied in different languages.

In order to assess whether there are significant differences in terms of accuracy between the proposed model and baselinemethods, we conducted a statistical test based on accuracy results obtained from 3-fold cross-validation. We used a paired

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 13: Bi-view semi-supervised active learning for cross-lingual sentiment classification

(a) En-Fr Dataset (b) En-Ch Dataset

(c) En-Jp Dataset

0 10 20 30 40 50 60 70 80 90 10077.5

78

78.5

79

79.5

80

80.5

81

81.5

82

82.5

Number of manually labelled examples

Acc

urac

y (%

)

k = 5k = 10k = 20k = 40k = 60k = 80k = 100

0 10 20 30 40 50 60 70 80 90 10070

71

72

73

74

75

76

77

Number of manually labelled examples

Acc

urac

y (%

)

k = 5k = 10k = 20k = 40k = 60k = 80k = 100

0 10 20 30 40 50 60 70 80 90 10070

70.5

71

71.5

72

72.5

73

73.5

74

74.5

Number of manually labelled examples

Acc

urac

y (%

)

k = 5k = 10k = 20k = 40k = 60k = 80k = 100

Fig. 6. Effect of different values of k used in the density estimation formula concerning the accuracy of DACT model in different datasets.

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 13

t-test to evaluate whether differences between the two methods are statistically significant. Table 3 shows the numericalresults of the statistical test. With the exception of those between DACT and ACT in the En–Ch and En–Jp datasets, all othercomparisons showed statistically significant differences, for a significant level of a = 0.05.

Fig. 4 shows the learning curves of various methods on three evaluation datasets. In each of the iterations of cross-val-idation, all methods used the same initial training set (i.e. labelled documents from the source language), and hence havethe same initial accuracy in the learning curve. Each fold of cross-validation generated a learning curve for the experimentof each model. The final learning curve was determined using the average accuracies of each point from generated curves. Asshown in this figure, comparison of the proposed method (DACT) with the ACT model resulted in the classification accuracyof the proposed model improving very quickly in the first few learning cycles. This is due to the fact that the examplesselected based on density measure for human labelling are more representative than examples selected randomly fromthe contention examples set. Accuracy of the learning model is far more important in the early steps rather than in the latersteps, since the aim of active learning is to improve performance through the selection of a small number of unlabelledexamples for manual labelling. For example, in the En–Fr dataset, in order to reach a level of 80% accuracy, DACT needsto label less than 20 examples from the unlabelled pool; while ACT, US-AL, Co-Testing, Co-Training and RS need about 50,90, 80, 100 and more than 100 labelled examples respectively. Therefore, the proposed model, DACT, can reduce the labellingefforts while increasing the performance of CLSC.

Fig. 5 compares the accuracy of combined views (bi-view) and each of the individual views of the DACT model during thelearning process in three datasets. This figure shows that the combination of the source view and the target view in CLSCoutperforms both individual views in all datasets. This supports the idea that two views can complement each other soas to compensate for the negative effect of translation errors in resource translation.

5.5.1. Effect of different values of k on the performance of classificationWe conducted a further experiment to estimate the appropriate value for parameter k used in the average density for-

mula. We changed the value of k from 5 to 100 and experimented with the effect of different k on the accuracy of classifi-cation through the learning process.

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 14: Bi-view semi-supervised active learning for cross-lingual sentiment classification

Table 4Accuracy of DACT model with different k after 100 learned training examples. The best performance for each dataset is indicated by a boldface number.

Dataset Accuracy (%)

K = 5 K = 10 K = 20 K = 40 K = 60 K = 80 K = 100

En–Fr 81.97 82.17 81.90 81.65 81.75 81.70 81.32En–Ch 75.67 76.32 76.02 76.02 76.25 75.95 76.20En–Jp 74.14 74.24 73.99 74.04 74.19 73.89 74.17

14 M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx

Fig. 6 shows the results of this experiment on each dataset and the final results of DACT with different k are summarizedin Table 4. As shown in this table, k = 10 is appropriate by which to estimate the density of an unlabelled example in thismethod. We have used this value in all experiments.

The proposed model apparently depends on the accessibility of machine translation services to project data between thesource and the target language. Although most of the machine translation services have the capability of translating text doc-uments from and into a large number of languages, they cannot be used to freely translate large amounts of data. This issuemay limit the use of machine translation in CLSC.

6. Conclusion and future work

In this paper, we have proposed a new model based on bi-view classification by combining active learning and semi-supervised co-training approaches in order to reduce the human labelling effort in cross-lingual sentiment classification.In this model, both labelled and unlabelled data are represented in the source and target languages using a machine trans-lation service to create two different views of data. Co-training and co-testing algorithms were then used to augment unla-belled data from the target language into the learning process. We also considered a density measure by which to avoidselecting outlier examples from unlabelled data so as to increase the representativeness of selected examples for manuallabelling in the co-testing algorithm. We applied this method to cross-lingual sentiment classification datasets in three dif-ferent languages and compared the performance on the proposed model with some baseline methods. The experimentalresults show that the proposed model outperforms the baseline methods in almost all datasets. The experimental resultsfurther show that employing automatic labelling, along with active learning, can increase the speed of the learning processand therefore reduce the manual labelling workload. Experiments also show that considering the density of unlabelled datain the query function of active learning can be very efficient when selecting the most representative and informative exam-ples for manual labelling.

However, translated data cannot cover all the vocabulary idiosyncrasies used in the original data and hence many sen-timent-bearing words cannot be learnt from projected data. Moreover, using projected data increases the number of featuresand leads to a creation of sparseness in the data points in the classification task.

In future work, we plan to use training data from more than one language as the source language to cover more vocab-ularies of original target language data and employ multi-view learning in this case. Moreover, we will try to include syntaxinformation so as to reduce the effect of translation errors.

Acknowledgement

The authors would like to thank the reviewers for their valuable comments and suggestions to significantly improve thequality of this paper. This work is supported by the Ministry of Higher Education (MOHE) and Research Management Centre(RMC) at the Universiti Teknologi Malaysia (UTM) under Exploratory Research Grant Scheme (Vote No. R.J130000.7828.4L051).

References

Balahur, A., & Turchi, M. (2014). Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. ComputerSpeech & Language, 28, 56–75.

Banea, C., Mihalcea, R., Wiebe, J., & Hassan, S. (2008). Multilingual subjectivity analysis using machine translation. In Proceedings of the proceedings of theconference on empirical methods in natural language processing (pp. 127–135). Honolulu, Hawaii: Association for Computational Linguistics.

Banea, C., Mihalcea, R., & Wiebe, J. (2010). Multilingual subjectivity: are more languages better? In Proceedings of the proceedings of the 23rd internationalconference on computational linguistics (pp. 28–36). Beijing, China: Association for Computational Linguistics.

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the Proceedings of the eleventh annual conference onComputational learning theory (pp. 92–100). Madison, Wisconsin, United States: ACM.

Cheng, J., & Wang, K. (2007). Active learning for image retrieval with Co-SVM. Pattern Recognition, 40, 330–334.Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 15, 201–221.Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.Harb, A., Planti, M., Dray, G., Roche, M., Fran, Trousset, o., & Poncelet, P. (2008). Web opinion mining: how to extract opinions from blogs? In Proceedings of

the proceedings of the 5th international conference on soft computing as transdisciplinary science and technology (pp. 211–217). Cergy-Pontoise, France:ACM.

Jingbo, Z., Huizhen, W., Tsou, B. K., & Ma, M. (2010). Active learning with sampling by uncertainty and density for data annotations. IEEE Transactions onAudio, Speech, and Language Processing, 18, 1323–1331.

Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods (pp. 169–184): MIT Press.

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005

Page 15: Bi-view semi-supervised active learning for cross-lingual sentiment classification

M.S. Hajmohammadi et al. / Information Processing and Management xxx (2014) xxx–xxx 15

Kang, H., Yoo, S. J., & Han, D. (2012). Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems withApplications, 39, 6000–6010.

Ku, L. W., Liang, Y. T., & Chen, H. H. (2006). Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006 springsymposium on computational approaches to analyzing weblogs.

Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the proceedings of the 17th annual international ACMSIGIR conference on research and development in information retrieval, (pp. 3–12). Dublin, Ireland: Springer-Verlag New York Inc.

Li, S., Ju, S., Zhou, G., & Li, X. (2012). Active learning for imbalanced sentiment classification. In Proceedings of the 2012 Joint Conference on Empirical Methodsin Natural Language Processing and Computational Natural Language Learning (pp. 139–148). Jeju Island, Korea: Association for Computational Linguistics.

Li, M., & Sethi, I. K. (2006). Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1251–1261.Liu, B. (2012). Sentiment analysis and opinion mining (Vol. 5): Morgan & Claypool Publishers.Martín-Valdivia, M.-T., Martínez-Cámara, E., Perea-Ortega, J.-M., & Ureña-López, L. A. (2013). Sentiment polarity detection in Spanish reviews combining

supervised and unsupervised approaches. Expert Systems with Applications, 40, 3934–3942.Mihalcea, R., Banea, C., & Wiebe, J. (2007). Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th annual meeting

of the association of computational linguistics (Vol. 45, pp. 976–983).Moh, T.-S., & Zhang, Z. (2012). Cross-lingual text classification with model translation and document translation. In Proceedings of the proceedings of the 50th

annual southeast regional conference (pp. 71–76). Tuscaloosa, Alabama: ACM.Montoyo, A., Martínez-Barco, P., & Balahur, A. (2012). Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged

developments. Decision Support Systems, 53, 675–679.Muslea, I., Minton, S., & Knoblock, C. A. (2006). Active learning with multiple views. Journal of Artificial Intelligence Research, 27, 203–233.Ortigosa-Hernández, J., Rodríguez, J. D., Alzate, L., Lucania, M., Inza, I., & Lozano, J. A. (2012). Approaching sentiment analysis by using semi-supervised

learning of multi-dimensional classifiers. Neurocomputing, 92, 98–115.Pan, J., Xue, G.-R., Yu, Y., & Wang, Y. (2011). Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization. In J. Huang, L. Cao & J.

Srivastava (Eds.), Advances in knowledge discovery and data mining (Vol. 6634, pp. 289–300): Springer, Berlin/Heidelberg.Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the proceedings of the

ACL-02 conference on empirical methods in natural language processing – Volume 10, (pp. 79–86). Association for Computational Linguistics.Park, S.-B., & Zhang, B.-T. (2004). Co-trained support vector machines for large scale unstructured document classification using unlabeled data and

syntactic information. Information Processing and Management, 40, 421–439.Prettenhofer, P., & Stein, B. (2011). Cross-lingual adaptation using structural correspondence learning. ACM Transactions on Intelligent Systems and

Technology, 3, 1–22.Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the proceedings of the

eighteenth international conference on machine learning, (pp. 441–448). Morgan Kaufmann Publishers Inc.Shi, L., Mihalcea, R., & Tian, M. (2010). Cross language text classification by model translation and semi-supervised learning. Proceedings of the Proceedings

of the 2010 Conference on Empirical Methods in Natural Language Processing, (pp. 1057–1067). Cambridge, Massachusetts: Association forComputational Linguistics.

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37, 267–307.Tang, M., Luo, X., & Roukos, S. (2002). Active learning for statistical natural language parsing. In Proceedings of the proceedings of the 40th annual meeting on

association for computational linguistics, (pp. 120–127). Philadelphia, Pennsylvania: Association for Computational Linguistics.Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the proceedings of

the 40th annual meeting on association for computational linguistics (pp. 417–424). Philadelphia, Pennsylvania: Association for Computational Linguistics.Wan, X. (2008). Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of the proceedings of the

conference on empirical methods in natural language processing (pp. 553–561). Honolulu, Hawaii: Association for Computational Linguistics.Wan, X. (2009). Co-training for cross-lingual sentiment classification. In Proceedings of the proceedings of the joint conference of the 47th annual meeting of the

ACL and the 4th international joint conference on natural language processing of the AFNLP, 1, (pp. 235–243). Suntec, Singapore: Association forComputational Linguistics.

Wan, X. (2011). Bilingual co-training for sentiment classification of Chinese product reviews. Computational Linguistics, 37, 587–616.Wang, R., Kwong, S., & Chen, D. (2012). Inconsistency-based active learning for support vector machines. Pattern Recognition, 45, 3751–3767.Xia, R., Zong, C., & Li, S. (2011). Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences, 181, 1138–1152.Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert

Systems with Applications, 36, 6527–6535.Zhu, J., Wang, H., Yao, T., & Tsou, B. K. (2008). Active learning with sampling by uncertainty and density for word sense disambiguation and text

classification. In Proceedings of the proceedings of the 22nd international conference on computational linguistics – Volume 1, (pp. 1137–1144). Manchester,United Kingdom: Association for Computational Linguistics.

Please cite this article in press as: Hajmohammadi, M. S., et al. Bi-view semi-supervised active learning for cross-lingual sentiment clas-sification. Information Processing and Management (2014), http://dx.doi.org/10.1016/j.ipm.2014.03.005