13
HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE STUDY OF APPLYING OPINION MINING ON VIETNAMESE AND AMERICAN ONLINE REVIEWS Hoanh-Su Le a , Jong-Hwa Lee b and Hyun-Kyu Lee c Candidate Ph. D., Graduate School of Business, Pukyong National University, S. Korea Tel: +82-51-629-5177, E-mail: [email protected] Candidate Ph. D., Graduate School of Business, Pukyong National University, S. Korea Tel: +82-51-629-5177, E-mail: [email protected] Candidate Ph. D., Graduate School of Business, Pukyong National University, S. Korea Tel: +82-51-629-5177, E-mail: [email protected] Abstract Tourism is considered as one of the most important industries in Vietnam. The Government continuously keeps managing and asking for improving all sectors related to tourism. As an important infrastructure for tourism industries, hotels for accommodation are highly considered for improving customer services. On hotel booking and reviews channels, customers express their opinions and feedback about their experienced hotels by writing online reviews, this is valuable source of information that hotel managers should utilize. In this study, we collected 37,712 online reviews about Vietnamese hotels written by domestic and foreign customers in Vietnamese and English languages. Then we developed a hybrid model to perform opinion mining on multilingual social media text and explore for customer opinion differences across cultures. The results show that there are differences in hotel service preferences between Vietnamese and American customers. Based on this research result we recommend for improving customer satisfactions via diversifying across cultures services. Keywords: Sentiment analysis; hotel services; multilingual language; machine learning; Vietnam 1. Introduction Tourism is considered as one of the most important industries in Vietnam. The Government continuously keeps managing and asking for improving all sectors related to tourism. During the last decades, many hotels, restaurants, visiting places projects have been invested in Vietnam. Since early 2015, the Ministry of Foreign Affairs also issued non-visa tourism policies for visitors from many countries to enter Vietnam. According to Vietnam National Administration of Tourism, there were approximately 57 million domestic travelers and 8 million foreign travelers visiting Vietnam in 2015. As an important infrastructure for tourism industry, hotels for accommodation are highly considered for improving in customer services. With the development of the Internet, social media provides travelers a platform for interactivity and gives them ability to make his or her voice heard. Online user reviews, as a particular format of social media and a major form of electronic word-of-mouth, have been mined and researched in recent literature (Duan et al., 2013). Consumer generated reviews of travel and hotel services have been found to be a particularly critical information resource for travelers (Pan et al., 2007). The rapid growth of hotel room bookings via e-distribution portals has naturally concurred with the development of online reviews text mining researches. Many researchers argued that hotels need to actively embrace the user-generated content and concept of social networks to monitor online reviews and manage online reputation, because reviewers are gradually becoming the travel opinion leaders of the Internet age (O’Connor, 2010; Duan et al., 2013). Consequently, online customer reviews sentiment analysis is one of the effective ways to investigate customer opinions about hotel services that was applied in various researches and practices. A reddit’s score method was proposed by Patel et al. (2015) for text retrieval application development in customer opinion mining in tourism industry. Abdullatif and Yasser used online hotel customer reviews to improve the booking process by ranking the hotels from opinion mining reviews (Abdullatif and Yasser, 2014). Marrese and coauthors developed a model with a novel deterministic approach for aspect-opinion mining in tourist product reviews (Marrese et al., 2014). Their research extracted twenty one of hotel aspects in hotel reviews of Lake District in America. Other researches also relate to opinion mining and traditional service quality models such as SERVQUAL and SERVPERF (Parasuraman et al., 1985; Duan et al., 2013). Duan et al. (2013) 2016 한국경영정보학회 춘계학술대회 -137-

HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

HOTEL SERVICES PREFERENCES ACROSS CULTURES:

A CASE STUDY OF APPLYING OPINION MINING ON VIETNAMESE AND

AMERICAN ONLINE REVIEWS

Hoanh-Su Lea

, Jong-Hwa Leeb and Hyun-Kyu Lee

c

Candidate Ph. D., Graduate School of Business, Pukyong National University, S. Korea

Tel: +82-51-629-5177, E-mail: [email protected]

Candidate Ph. D., Graduate School of Business, Pukyong National University, S. Korea

Tel: +82-51-629-5177, E-mail: [email protected]

Candidate Ph. D., Graduate School of Business, Pukyong National University, S. Korea

Tel: +82-51-629-5177, E-mail: [email protected]

Abstract

Tourism is considered as one of the most important

industries in Vietnam. The Government continuously keeps

managing and asking for improving all sectors related to

tourism. As an important infrastructure for tourism

industries, hotels for accommodation are highly considered

for improving customer services. On hotel booking and

reviews channels, customers express their opinions and

feedback about their experienced hotels by writing online

reviews, this is valuable source of information that hotel

managers should utilize. In this study, we collected 37,712

online reviews about Vietnamese hotels written by domestic

and foreign customers in Vietnamese and English

languages. Then we developed a hybrid model to perform

opinion mining on multilingual social media text and

explore for customer opinion differences across cultures.

The results show that there are differences in hotel service

preferences between Vietnamese and American customers.

Based on this research result we recommend for improving

customer satisfactions via diversifying across cultures

services.

Keywords:

Sentiment analysis; hotel services; multilingual language; machine learning; Vietnam 1. Introduction Tourism is considered as one of the most important industries in Vietnam. The Government continuously keeps managing and asking for improving all sectors related to tourism. During the last decades, many hotels, restaurants, visiting places projects have been invested in Vietnam. Since early 2015, the Ministry of Foreign Affairs also issued non-visa tourism policies for visitors from many countries to enter Vietnam. According to Vietnam National

Administration of Tourism, there were approximately 57 million domestic travelers and 8 million foreign travelers visiting Vietnam in 2015. As an important infrastructure for tourism industry, hotels for accommodation are highly considered for improving in customer services. With the development of the Internet, social media provides travelers a platform for interactivity and gives them ability to make his or her voice heard. Online user reviews, as a particular format of social media and a major form of electronic word-of-mouth, have been mined and researched in recent literature (Duan et al., 2013). Consumer generated reviews of travel and hotel services have been found to be a particularly critical information resource for travelers (Pan et al., 2007). The rapid growth of hotel room bookings via e-distribution portals has naturally concurred with the development of online reviews text mining researches. Many researchers argued that hotels need to actively embrace the user-generated content and concept of social networks to monitor online reviews and manage online reputation, because reviewers are gradually becoming the travel opinion leaders of the Internet age (O’Connor, 2010; Duan et al., 2013). Consequently, online customer reviews sentiment analysis is one of the effective ways to investigate customer opinions about hotel services that was applied in various researches and practices. A reddit’s score method was proposed by Patel et al. (2015) for text retrieval application development in customer opinion mining in tourism industry. Abdullatif and Yasser used online hotel customer reviews to improve the booking process by ranking the hotels from opinion mining reviews (Abdullatif and Yasser, 2014). Marrese and coauthors developed a model with a novel deterministic approach for aspect-opinion mining in tourist product reviews (Marrese et al., 2014). Their research extracted twenty one of hotel aspects in hotel reviews of Lake District in America. Other researches also relate to opinion mining and traditional service quality models such as SERVQUAL and SERVPERF (Parasuraman et al., 1985; Duan et al., 2013). Duan et al. (2013)

2016 한국경영정보학회 춘계학술대회

-137-

Page 2: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

connected opinion mining on hotel reviews with five dimensions of SERVPERF model to examine their effects in shaping customers’ overall evaluation and content creating behaviors. In addition, tourism industry involves with global customers across cultures. Many researchers found significant cross-cultural differences when measuring customer satisfaction (Pizam & Ellis, 1999). Services and products that are important to Europeans may be different from those sought by Asians. Culture was found to have impact on perception, expectation and problem solving then leads to differences in satisfaction levels for a product between different global customers. Global customers may also have different expectations, different ways of evaluating performance of products and services. Seo (2012) performed a research on hotels in Las Vegas, USA, and found that service quality attributes are different among customers from USA, China and Japan. According to his research results, in tangible attribute of services quality; USA customers pay more considers on physical factors of hotels and cleanliness of rooms; Chinese customers pay more attentions on physical location of hotels and up to date hotel equipment and employee appearance while Japanese customers put their minds on physical factors of hotels, employee appearance and accessibility. The research also showed the differences between the three nations’ customers in other services quality attributes including assurance, reliability and empathy. On hotel booking and reviews portals such as Agoda.com or Booking.com, online reviews about Vietnamese hotels are written by not only domestic visitors but also foreigners. Domestic customers often write hotel reviews in Vietnamese language while foreigner customers write the comments in English or their own languages. Hence, performing opinion mining on multilingual social media text from these portals, researchers can not only explore customer opinions about hotel services in general but also explore how domestic and foreign visitors feedback to accommodation services, and compare the opinions between domestic and foreign customers or across cultures. Therefore, hotel managers may have guidelines to improve their hotel services as well as differentiate themselves by diversifying across cultures. In this case study, we aim to apply opinion mining techniques on multilingual text for opinion mining online reviews in hotel services industry. We collected online reviews about Vietnamese hotels on Agoda.com and developed a hybrid opinion mining model on multilingual social media text. Details of data collection, experiment design and results will be presented in the next sections. 2. Literature Review

2.1. Text mining

Text mining is also known as text data mining, which refers the process of deriving high-quality information from text. Text mining is an important part of data mining and knowledge discovery process. It processes a collection of

documents and extracts meaningful numeric indices from the text, and makes the information contained in the text accessible to the various data mining algorithms such as statistical and machine learning. Among the subtasks of text mining, opinion mining (also called sentiment analysis) is one of the difficult task that attracted many interests recently. It is one of the most active and widely studied in data mining, web mining and text mining (Bing & Liu, 2012).

2.2. Opinion mining

Sentiment analysis or opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language (Bing & Liu, 2012). In other words, opinion mining is an advanced technology to extract the opinion of person who created a specific document that is recently grown into the largest research interests in social network (Pang and Lee, 2008 ; Rao et al., 2014). In the past, there was a need to limit researchers to create or analyze a questionnaire through a lot of time and effort to listen to the voice of the customer. But using customer reviews online were able to minimize the time and effort by analysis. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. Especially, in the age of digital development era we now have a huge volume of opinionated data recorded in digital form for analysis.

2.3. Opinion mining techniques

Opinion mining can be considered a classification process. There are three main classification levels in opinion mining: document-level, sentence-level and aspect-level.

Document-level opinion mining aims to classify an opinion document as expressing positive or negative opinion or sentiment. Sentence-level opinion mining aims to classify sentiment expressed in each sentence. And aspect-level opinion mining aims to classify the sentiment with respect to the specific aspects of entities.

Opinion mining techniques can be roughly divided into machine learning approach; lexicon based approach and integrated approach (Medhat, 2014). The machine learning approach applies the famous machine learning algorithms and uses linguistic features. The lexicon-based approach relies on a sentiment lexicon, a collection of known and precompiled sentiment terms. It is divided into dictionary-based approach and corpus-based approach which use statistical or semantic methods to find sentiment polarity. The integrated approach combines several approaches and models. Lexicon-based approach

Lexicon is an important indicator of sentiments called opinion words. A list of words/phrases is called sentiment lexicon. Words in a sentence express positive or negative opinion.

2016 한국경영정보학회 춘계학술대회

-138-

Page 3: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

Lexicon-based sentiment prediction is very popular in the context of opinion mining (Hu and Liu, 2004b; 2004a; Zhuang et al., 2006; Ku et al., 2006). This technique generally relies on a sentiment word dictionary. The lexicon typically contains a list of positive and negative words that are used to match words in the opinion text.

Machine learning approach

Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. In a machine learning based classification, documents require two sets: one is the training and the other is the test set. For training a set automatic classifiers are used that learns various characteristics of documents, and a test set is used to validate the automatic classifier’s performance.

A number of machine learning techniques have been adopted to classify the reviews. Machine learning techniques like Naïve Bayes (NB) (Zhang et al., 2011; Wu et al., 2013), maximum entropy (ME) (Xia et al., 2011), Decision Tree (Wu et al., 2008), and support vector machines (SVM) (Xu et al., 2011; Wu et al., 2013) have achieved great success in text categorization.

2.4. Vietnamese text mining

Vietnamese has its own characteristics that make big challenges for mining. Vietnamese is a monosyllabic language which uses twenty two letters in English alphabet (except for letters j, f, w, z), and adds seven new letters playing roles of tonalities such as “ă”, “â”, “ê”, “ô”, “ơ”, ”ư” and ”đ”. Vietnamese also have six different tones which modify the meaning of the words by adding diacritical marks such as level “-” (ngang), sharp “/” (sắc), hanging “\” (huyền), asking “?” (hỏi), tumbling “~” (ngã) and heavy “.” (nặng). In informal writing format, Vietnamese people sometimes drop the diacritical marks that make readers as well as machine analyzers difficult for process. Several studies have been conducted to solve some sub-tasks in text mining. The most highlighted researches have focused on building Vietnamese Machine Tractable Dictionary and machine translation (Bao et al., 2003; Dien, 2002), Vietnamese Part-of-speed tagger, spelling detection and correction (Dien & Kiem, 2003; Nguyen et al., 2008; Le-Hong et al., 2010), construction Vietnamese ontology from online text (Nguyen & Yang, 2012), Vietnamese Word Segmentation (Dien et al., 2001; Nguyen et al., 2006), Vietnamese text classification, Sentence Reduction (Hoang et al., 2007, Thu et al., 2012, Thu & Ngoc, 2014), extracting phrases and text summarization (Le et al., 2010; Thu et al., 2013). In this research, we developed once more step by proposing and evaluating several methods for opinion mining Vietnamese as well as English social media text. 3. Research methods

3.1. Data Collection

Data for analysis in this study were collected from Agoda.com about hotels services reviews. Although

Vietnam has thousands of hotels across the country, Hanoi and Ho Chi Minh City are capital and economic central places, so we chose to collect online comments about 1149 hotels in Hanoi and Ho Chi Minh cities. Firstly, we accessed XML page of Agoda.com and collected text data those are belong to Hanoi and Ho Chi Minh cities’ hotels. The data crawling and preprocess work flow are illustrated in Figure 1. The text data contains several languages but Vietnamese (for domestic visitors) and English (for foreigners) are the most dominant. In this study, we aim to compare customer opinions between domestic (Vietnamese) and foreign travelers (American), so we filtered the reviewers who come from Vietnam and USA. Totally, 34,712 online reviews are collected and average of 3.31 sentences per a review. Unrated hotels’ data was removed, so hotel levels include from 5 stars to 1 star ranks. Data characteristic statistics show that the number of online reviews decreases from 5 stars to 1 star hotel, this is reasonable because higher ranking hotels usually have more rooms and capacity for serving more customers than lower rank hotels. There is also variety of traveler type from couples, solo travelers to business, groups and families. Among the collection data, 2000 positive and 2000 negative sentences of online reviews were randomly sampled and labeled for supervised machine learning.

Figure 1. Data collection and preprocess 1

3.2. Data preprocessing

2016 한국경영정보학회 춘계학술대회

-139-

Page 4: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

The data collected in previous steps were preprocessed before performing opinion analysis. Initially crawled data were filtered with reviewer nationality attribute in the dataset to select only reviews written by Vietnamese and American customers.

For reviews written by Vietnamese customers, most of them are in Vietnamese language but few of them are also written in English and even mixing usage of English and Vietnamese. By using a standard Vietnamese dictionary and frequency analysis, we composed a non-standard Vietnamese words dictionary which includes typos, misspellings, abbreviations, slangs and mixed usage of English in Vietnamese text in the online reviews. After that we applied that dictionary for standardizing the Vietnamese text corpus.

On the other hand, most of the online reviews written by American customers are in English but language detection R package was used to detect correct English in the dataset. Some reviews written in non-English were discarded because they are out of scope of this study. Similar to the method used in composing Vietnamese non-standard dictionary, we performed frequency analysis of extract terms from the dataset and looked for English words which are misspelling, abbreviations, creative words and slangs. A list of 346 non-standard English words was found and we built an English non-standard words dictionary based on the list, then applied it to standardize the non-standard words into normal form in English corpus.

After removing unnecessary repeated letters and standardizing non-standard words, all Vietnamese and English corpora are processed by detecting emoticons, transforming into lower case, cleaning up spare spaces, blanks, punctuations, numbers and tokenizing into sentences for next processing steps.

In case of Vietnamese corpus, after preprocessing we applied word-segmentation package to detect standard Vietnamese word boundaries and tied mono-syllable words into a full meaning word. For example, a word “hiện đại” (modern) will be tied into “hiện_đại” then it will be treated as a single word like English words in the text analyzers.

3.3. Lexicon-based model design

Lexicon-based model was designed to classify opinionated text based on sentiment lexicon resources. Because each language has its own characteristics and sentiment words list, we take advantages of opinion dictionary enhanced from previous research (Le et al., 2016) for Vietnamese text corpus as well as prior developed English sentiment resources based on Baccianella et al. (2010). In this experiment of lexicon-based model, we based on the lexicon resources in each Vietnamese and English language to calculate the polarity of each sentence in the corpora. Data sources after preprocessing are sent to analyzers of English and Vietnamese for processing particular steps of each language. As for Vietnamese text,

before tokenization review sentences into word-levels, word-segmentation was performed to detect boundaries and tie word phrase into a single word-group, then POS tagging was applied to tag for type of words including “noun”, “verb”, “adjective” and “adverb”. Polarity is calculated based on these extracted words and opinion word dictionary for each language.

3.4. Machine learning model design

Machine learning was performed on labeled text corpora for each language after preprocessing. Feature selection and extraction methods were used including Term frequency – inverse document frequency (TFIDF), Chi-squared (CHI) and Information gain (IG). Training algorithms include Naïve Bayes, Decision Trees, Support Vector Machines and Artificial Neural Network. The combination of features extraction methods and algorithms produces 12 kinds of models. These models will be tested and tuned parameters for searching for the best performance model in each kind of language with training data. Validation data was used to calculate models performance matrix and compare together. The design of single machine learning model is presented in Figure 2.

3.5. Hybrid model design

Hybrid model is designed by integrating lexicon-based and machine learning approaches. Machine learning models classify opinionated text based on extracted features (word) as input variables regardless the meaning or sentiment scoring of the words. Lexicon-based model classified opinionated text based on the defined sentiment resources in opinion dictionaries. The hybrid model takes the score

Figure 1. Machine learning model

2016 한국경영정보학회 춘계학술대회

-140-

Page 5: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

output of lexicon-based model and treats them as additional input for machine learning model at the next step in classifiers. In this experiment, we will design twelve combinations among feature extraction methods and training algorithms and compare together to find the best performance model.

4. Experiment results

4.1. Lexicon-based opinion mining results

In lexicon-based approach, sentiment dictionaries were enhanced as described in section 3.3 and then they were used for sentiment analysis. Online reviews retrieved from the hotel portal were processed and then sent to the sentiment analysis programs. Sentiment analyzer refers to the sentiment dictionary of each language and calculates negative and positive score for each review sentence and determines the polarity of it. Sentiment analysis result of each review sentence was compared with their actual class to check its classification performance. Results of sentiment analyzing with lexicon-based approach are presented as Table 1 below.

Table 1. Lexicon-based opinion mining results (unit: %)

Language Vietnamese text English text

Accuracy 76.20 72.27

F_score 73.32 70.24

In case of Vietnamese text, accuracy of lexicon-based

sentiment analysis was 76.20% and F_score was 73.32%. In case of English text, accuracy of opinion mining classifier was 72.27% and F_score was 73.32%. These classifier performances are not too high but they are all more than 70%, so they can be used for implications in classifying text problem.

4.2. Machine learning based and hybrid model results

Vietnamese text results

Table 2 shows the comparison of predictive performance among 12 machine learning models that are presented in the column Single in the left side of the table under Vietnamese text section. In NB algorithms, using TFIDF feature extraction method has better performance than other two (CHI and IG); the best model in NB algorithm has F_score at 76.82% and the best model in SVM has 83.04%. In other algorithms, such as TREE and NNET, CHI and IG feature extraction methods have the same performance and better than TFIDF. The best model of TREE has F_score at 83.00% (CHI and IG method), NNET has highest F_score at 84.55%. The results show that single machine learning models are all better performance than lexicon-based model in 4.1 section.

The classification performances of hybrid models in the

case of Vietnamese text are presented in Table 2 under column Hybrid. There are 12 hybrid models that are arranged in comparison with Single models. Among the hybrid models, Naive Bayes with CHI/IG has the lowest performance with F_score at 70.42%, all other models have F_score larger than 80%. In order to test if hybrid models are better than single models, we will compare the F_score and McNemar test. Except for Naive Bayes with CHI/IG has the lowest F_score, other models have F_score much larger than F_score of lexicon-based approach (73.32%), so we will only test the difference between performance of hybrid and single machine learning models.

As results showed in Table 2 and Figure 3, all twelve pairs of comparison show the significant improvement between single machine learning and hybrid models. SVM with TFIDF, NNET with CHI/IG have the lowest McNemar test statistics 3.327 (p = 0.067), so it is significant at level 10%. NB and TREE with TFIDF have the lowest McNemar test statistics 5.02 (p=0.025), so it is significant at level 5%. Other hybrid model has significantly improved compare with single machine learning models at level 1%.

Figure 3 shows the comparison of performance (F_score) among the hybrid models. With NB algorithm, TFIDF feature extraction method has better performance than CHI and IG methods while in TREE, SVM and NNET, CHI or IG are the best methods. The McNemar test results show that the performance of hybrid model of lexicon and NNET with CHI feature extraction method is the best, hybrid model of lexicon and TREE with CHI feature extraction method is the next and followed by hybrid model of lexicon and SVM with CHI feature extraction method. Hybrid of NB and CHI has the lowest performance among the four algorithms in the case of Vietnamese text.

English Text results

English text data results are presented in Table 2 under the English text section where we arrange 12 pairs of models under Single and Hybrid columns. Regarding to machine learning models, CHI/IG feature selection methods lead to the best performance models in all 4 kinds of algorithms NB, TREE, SVM and NNET. Among these single machine learning model, NNET with CHI/IG has the best performance with F_score 78.65% and Accuracy 76.38% and the lowest performance. Regarding to hybrid models, the best performance models is NNET with CHI/IG feature selection method with F_score 80.42% and Accuracy 78.98%.

Table 2 and Figure 3 present the comparisons between Single machine learning and hybrid models. NB with CHI/IG feature extraction methods have McNemar test statistics at 1.82 (p = 0.178), it means there is no significantly different between single and hybrid models. All other ten combinations of learning algorithms and feature extraction methods have the significantly different performance between single and hybrid methods at level 1%. Hybrid model of lexicon and NB with TFIDF has the

2016 한국경영정보학회 춘계학술대회

-141-

Page 6: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

lowest performance with F_score at 72.92%; all other hybrid models of TREE, SVM and NNET algorithms have F_score larger than 75%. The highest improvement performance (12.15%) is the combination of lexicon and NB with TFIDF. This means that integrating lexicon model into machine learning model significantly improve the performance of the classifiers.

Figure 3 also shows the comparison of performance (F_score) among the hybrid models. In NB algorithms, CHI and IG feature extraction methods have the same performance and they are better than TFIDF method. And in cases of TREE, SVM and NNET algorithms, CHI and IG feature extraction methods also have better performance than TFIDF. McNemar test results show that hybrid model of lexicon and NB-CHI has significantly lower performance than other ones, hybrid model of lexicon and NNET-TFIDF has significantly better than hybrid of lexicon and each of

SVM-CHI and NNET-CHI. And TREE with CHI feature extraction is also significantly different from SVM-CHI in case of English text. In this section, we found that among the single machine learning algorithms; NNET, TREE, SVM and NB are in the order of top classification performance. By integrating lexicon-based and machine learning approach, hybrid models have significantly improved classification performance better than single machine learning model for both Vietnamese and English text. In addition, among hybrid models; NNET, TREE, SVM and NB are the order of best performance comparison in the both languages. In the next section we will choose hybrid model of lexicon-based approach and a combination of Chi-square feature extraction method with artificial neural network algorithms as the classifier for classifying un-label multilingual text.

Table 2. Single models and hybrid models opinion mining results

Language Vietnamese Text English Text

Non-Std

Dictionary Single Hybrid Single Hybrid Single Hybrid Single Hybrid Single Hybrid Single Hybrid

Models NB+TFIDF NB+CHI NB+IG NB+TFIDF NB+CHI NB+IG

Accuracy 75.78 77.3 71.89 73.05 71.89 73.05 53.33 65.76 69.94 68.25 69.94 68.25

F_score 76.82 80.6 66.46 70.42 66.46 70.42 60.77 72.92 74.17 73.81 74.17 73.81

McNemar test 6.03**

(p = 0.015) 7.29*** (p=0.007) 7.29*** (p=0.007)

634.02*** (p = 0.000)

1.82 (p=0.178)

1.82 (p=0.178)

Models TREE+TFIDF TREE+CHI TREE+IG TREE+TFIDF TREE+CHI TREE+IG

Accuracy 80.95 83.51 84.11 86.89 84.11 86.89 47.01 76.38 72.88 77.51 72.88 77.51

F_score 81.24 83.69 83.00 87.48 83.00 87.48 63.95 77.01 75.90 79.38 75.90 79.38

McNemar test 5.02*** (p=0.025) 38.91*** (p=0.000) 38.91*** (p=0.000) 830.00*** (p=0.000)

13.29*** (p=0.000)

13.29*** (p=0.000)

Models SVM+TFIDF SVM+CHI SVM+IG SVM+TFIDF SVM+CHI SVM+IG

Accuracy 81.62 84.46 83.76 86.89 83.76 86.89 47.8 76.95 75.25 77.97 75.25 77.97

F_score 83.04 85.89 82.24 86.58 82.24 86.58 63.85 78.16 78.08 78.41 78.08 78.41

McNemar test 3.327* (p=0.067) 12.25*** (p=0.000) 12.25*** (p=0.000)

689.06*** (p=0.000)

61.55*** (p=0.000)

61.55*** (p=0.000)

Models NNET+TFIDF NNET+CHI NNET+IG NNET+TFIDF NNET+CHI NNET+IG

Accuracy 82.68 85.95 85.14 87.7 85.14 87.7 47.8 76.16 76.38 78.98 76.38 78.98

F_score 83.63 86.32 84.55 87.79 84.55 87.79 63.85 77.91 78.65 80.42 78.65 80.42

McNemar test 7.04***

(p=0.008) 3.59** (p=0.058)

3.59** (p=0.058)

647.24*** (p=0.000)

12.17*** (p=0.000)

12.17*** (p=0.000)

2016 한국경영정보학회 춘계학술대회

-142-

Page 7: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

Vietnamese text

English Text

Figure 2. Single models and hybrid models opinion mining results

4.3. Application of opinion mining in hotel services industry

Opinion classification for online review

As result presented in experiments section, in both cases of Vietnamese and English text, the hybrid model of lexicon-based approach and Artificial Neural Network machine learning with Chi-square feature extraction method was selected to forecast the classification of unlabeled multilingual text data.

In this section we will compare customer opinions toward Vietnamese hotels between domestic and foreign (American) customers. Because the number of online reviews in English is larger than in Vietnamese, in order to avoid bias due to the difference of the number of reviews, we randomly pick out each language corpora 10,000 sentences and treat them as balance data for comparison. Then the hybrid model lexicon-based and Artificial Neural Network with CHI feature extraction method is used to classify the sentiment polarity of each sentence. Because customer opinion expressed in each sentence may mention about a hotel or hotel aspects, next section will present the way to extract hotel aspects.

Hotel aspects extraction

After performing opinion mining on the set of sentences split from online reviews. We have a set of opinionated

sentences about hotel services. Because travelers do not only express their opinions hotel itself in general but also on any aspects of a hotel or related services. Hotel managers need to know which aspects are commented with positive opinion and which are commented with negative for their appropriate improvement action. In this section we will describe how to extract aspects related to hotel services in opinionated text. In a simple illustration, consider an online review Di talking about an entity Ei, this review is split into k sentences Di = {Si1, Si2,…, Sik}. Although the whole review Di is talking about an entity Ei but each sentence Sij (j = 1… k) may mention to the entity Ei or any aspects of entity Ei. As defined by Hu & Liu (2004), an aspect is usually indicated or denoted by user with an aspect expression. An aspect expression is the actual word or phrase that comprises the general concepts of an aspect. They also proposed a technique to extract aspects based on natural language processing. A part-of-speed (POS) tagging and syntax tree chunking (or parsing) are used to find nouns and noun phrases in the sentences. After that frequency analysis on the nouns and noun phrases are performed to extract the most frequent nouns and noun phrases as aspect expression candidates. The extracted sets of aspect expression candidates are then filtered using special linguistic rules to eliminate redundant aspects.

2016 한국경영정보학회 춘계학술대회

-143-

Page 8: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

Table 3. Top 50 aspect expressions on Vietnam hotels reviewed by Vietnamese customers

Table 4. Top 50 aspect expressions on Vietnam hotels reviewed by American customers

staff bathroom view look internet

location market charge desk noise

hotel street shower management balcony

breakfast airport toilet customer manager

service buffet elevator windows coffee

restaurant water place room cost

price reception towel door floor

smell food receptionist bed booking

wall distance wifi air people

restaurants pool town bar arrival

Following the methodology introduced by Hu & Liu,

we used Stanford POS tagger for tagging English text and designed RVN POS tagger for tagging Vietnamese text. After nouns and noun phrases are extracted, we performed frequency analysis and listed nouns and nouns phrase from largest to smallest order. Because our online reviews are talking about hotel and related services, we scanned the top list of nouns and noun phrases to eliminate the words those are not related to hotel. The final lists of the top 50 most frequent aspect expressions are presented in Table 3 and Table 4. As seen on the two tables, the most frequent aspect order is different between USA and Vietnamese customers. We will discuss in details these across culture differences in next section.

Summarization for opinion on hotel aspects

Because an entity may have many aspects it is necessary to rank the aspects and summarize the opinion analysis on a limited list of aspects toward an entity. Hu & Liu (2004) proposed a way to rank aspects based on the frequency of their occurrences in the reviews. However, this ranking method is neglecting sentiment orientation of the sentences or documents that contain the aspects. Therefore, other kinds of ranking are also introduced; it is ranking aspects according to the number of reviews that express positive or negative opinions. Marrese et al. (2014) argued that an aspect that has a lot of positive and negative opinions will be more important, because the high number of opinions of both orientations may indicate that customers are very interested in that aspect. Hence, a kind of ranking the importance of aspects based on both amount of positive and negative opinions of it simultaneously is also be discussed.

vị_trí (location)

nước (water)

ăn_uống (food)

khu_phố (town)

du_lịch (travel)

nhân_viên (staff)

tiếp_tân (reception)

không_gian (space)

nhìn (look)

đồ_ăn (guess)

khách_sạn (hotel)

thái_độ (attitude)

book (book)

nội_thất (furniture)

thức_ăn (food)

giá (price)

tiền (money)

dịch_vụ (service)

máy_lạnh (air-conditioner)

trang_thiết_bị (equipment)

tiện_nghi (facilities)

sân_bay (airport)

cửa (door)

thanh_toán (payment)

wifi (wifi)

phục_vụ (server)

thành_phố (city)

giường (bed)

ăn_sáng (breakfast)

bồn_tắm (bathtub)

địa_điểm (place)

chợ (market)

thang_máy (elevator)

tiêu_chuẩn (standard)

bữa_sáng (breakfast-food)

phòng (room)

chất_lượng (quality)

vệ_sinh (cleaness)

trang_trí (decoration)

phòng_ngủ (bedroom)

đi_lại (transport)

cửa_sổ (windows)

khu_vực (area)

toilet (toilet)

tường (wall)

khách (guest)

thiết_kế (design)

phòng_tắm (bathroom)

nhà_hàng (restaurant)

âm_thanh (sound)

2016 한국경영정보학회 춘계학술대회

-144-

Page 9: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

Table 5. Top hotel aspects’ scores and relative importance reviewed by Vietnamese and American customers

Vietnamese customers USA customers

No

Asp

ects

P_s

core

N_s

core

Rel

ativ

e Im

port

ance

Asp

ects

P_s

core

N_s

core

Rel

ativ

e Im

port

ance

1 vị_trí (location) 90% -5% 100% staff 94% -28% 100%

2 nhân_viên (staff) 100% -39% 71% location 64% -7% 88%

3 khách_sạn (hotel) 44% -100% 66% hotel 100% -72% 42%

4 giá (price) 57% -28% 34% breakfast 41% -20% 31%

5 tiện_nghi (facilities) 29% -5% 29% service 23% -10% 20%

6 phục_vụ (server) 46% -22% 28% restaurant 14% -3% 17%

7 địa_điểm (place) 20% 0% 22% price 20% -10% 14%

8 phòng (room) 70% -88% 22% smell 1% -10% 14%

9 đi_lại (transport) 15% -1% 17% wall 0% -7% 10%

11 khách (guest) 4% -17% 16% bathroom 7% -13% 9%

12 nước (water) 2% -15% 15% market 8% -2% 9%

13 tiếp_tân (reception) 4% -14% 12% street 10% -5% 8%

14 thái_độ (attitude) 23% -14% 11% airport 9% -4% 8%

15 tiền (money) 5% -14% 10% buffet 8% -3% 7%

16 sân_bay (airport) 10% -2% 9% water 5% -10% 7%

17 thành_phố (city) 8% -1% 8% reception 4% -8% 7%

18 chợ (market) 8% -1% 7% food 13% -9% 6%

19 chất_lượng (quality) 3% -9% 7% distance 4% 0% 6%

20 cửa_sổ (windows) 1% -7% 6% pool 5% -2% 5%

21 thiết_kế (design) 5% 0% 6% view 8% -5% 5%

22 ăn_uống (food) 6% -1% 5% charge 2% -5% 5%

23 không_gian (space) 5% 0% 5% shower 6% -9% 5%

24 book (book) 1% -6% 5% toilet 0% -4% 5%

25 dịch_vụ (service) 11% -7% 5% elevator 1% -4% 5%

26 cửa (door) 1% -6% 5% place 14% -11% 4%

27 giường (bed) 2% -7% 5% towel 0% -3% 4%

28 thang_máy (elevator) 1% -5% 5% receptionist 1% -4% 4%

29 vệ_sinh (cleaness) 6% -10% 4% wifi 5% -3% 4%

30 khu_vực (area) 5% -1% 4% town 4% -1% 4%

31 khu_phố (town) 4% -1% 4% look 6% -9% 4%

32 nhìn (look) 2% -5% 3% desk 5% -7% 3%

33 nội_thất (furniture) 5% -2% 3% management 0% -2% 3%

34 máy_lạnh(airconditioner) 0% -3% 3% customer 3% -5% 3%

35 thanh_toán (payment) 0% -2% 2% windows 1% -3% 3%

36 ăn_sáng (breakfast) 10% -12% 2% room 98% -100% 3%

37 tiêu_chuẩn (standard) 2% -4% 2% door 5% -7% 3%

38 trang_trí (decoration) 2% 0% 2% bed 14% -12% 3%

39 toilet (toilet) 0% -3% 2% air 18% -19% 2%

40 nhà_hàng (restaurant) 3% -2% 2% bar 6% -4% 2%

* Minus sign of N_score indicates that it is negative opinion Consider Pi and Ni be the number of positive and negative opinions on aspect ai (i = 1…n). Then standardizing positive opinion score (P_scorei) and positive opinion score (N_scorei) for each aspect expression is calculated as the min-max normalized values of Pi and Ni.

i 1 2 ni

1 2 n i 1 2 n

P -Min{P ,P ,...,P }P_score =

Max{P ,P ,...,P } -Min{P ,P ,...,P } (1)

i 1 2 ni

1 2 n i 1 2 n

N -Min{N ,N ,...,N }N_score =

Max{N ,N ,...,N } -Min{N ,N ,...,N }(2)

From above formulas, P_scorei and N_scorei are in

2016 한국경영정보학회 춘계학술대회

-145-

Page 10: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

range from 0 to 1. Then the standard deviation of these scores is calculated as below:

i ii

P_score +N_scoreAV_score =

2 (3)

2 2i i i i

i

(P_score -AV_score ) +(N_score -AV_score )STD_score =

2(4)

According to Marrese et al. (2014)’s proposal, the min-max normalized value of STD_scorei is defined as a measure for each aspect expression ai, it is named Relative importance and calculated as below:

i 1 1 n

1 1 n 1 1 n

STD_score -Min{STD_score ,STD_score ,... STD_score }Relative Importance

Max{STD_score ,STD_score ,... STD_score } -Min{STD_score ,STD_score ,... STD_score }= (5)

In this research, after aspects extraction we counted the number of positive opinions Pi and negative opinions Ni. Then we calculated the P_scorei and N_scorei for each aspect ai in Table 3, Table 4 with the equations (1) and (2). Relative Importance for each aspect ai is calculated with the equation (5).

Table 5 presents the results of P_score, N_score and Relative importance among the aspects regarding to the online reviews of Vietnamese and American customers.

Values of P_score, N_score and Relative importance were from 0 to 1 but we converted to percentage for better visualization. The aspect with value 100% means that it is the highest score aspect in the corresponding ranking method. Figure 4 presents the visual summary comparison of hotel aspect reviewed by Vietnamese customers and Figure 5 presents the information for American customers. Because of limitation of graph size, only top 40 aspect expressions are drawn and they are ordered according to relative importance in a descending left to right direction.

Figure 3. Vietnam hotel aspect summary reviewed by Vietnamese customers

Aspects are ordered according to relative importance in a descending manner.

N_score P_score

2016 한국경영정보학회 춘계학술대회

-146-

Page 11: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

Figure 4. Vietnam hotel aspect summary reviewed by American customers

4.4. Discussions

Above results show that there are differences in customer opinions about hotel services in Vietnam between domestic and foreign customers (USA). In positive opinions ranking criteria, Vietnamese customers pay the most interests in staff, location, room, price, serving, hotel, facilities while Americans pay the most interests on hotel, room, staff, location, breakfast, service, price, air and restaurant and so on. In negative opinions ranking criteria, Vietnamese customers pay interests on hotel, room, staff, price, serving and guest while Americans pay more interests on room, hotel, staff, breakfast, air, bathroom, bed and so on. By referring to the list of aspects ranking in positive opinion order, hotel managers can have a guide line for sustaining customer satisfactions by keeping improving those aspects. And based on the list of aspects ranking in negative opinion order, hotel managers can choose aspects to improve for reducing dissatisfactions not only for domestic but also for foreign customers.

However, when considering positive and negative opinions are simultaneously important factors, the list of top aspects ranking in Relative importance order is valuable for choosing aspects to improve in an overall. In this perspective, Vietnamese customers pay the most interests on facilities and services (location, hotel, price, facilities, serving, place, room, transport and so on) while Americans are interested in people and food (staffs, location, hotel, breakfast, service, restaurant, price, smell and so on). This information is valuable for hotel managers to know which aspects of their hotel should be improved and offer differentiating services based on customers’ cultures.

5. Conclusions This research has several contributions in academic

research field as well as business applications in a couple of ways.

Firstly, due to the fact that there are limited resources for opinion mining Vietnamese text data, especially social media text that are rapidly increasing in recent years, this research has proposed a method for opinion mining on multilingual social media text such as Vietnamese and English. This work also conducted several combinations of opinion mining models for multilingual Vietnamese - English social media text. Evaluating the predictive performance of lexicon based approach model, hybrid models of several feature extraction methods and algorithms of machine learning such as Naïve Bayes, Decision Tree, Support Vector Machine and Artificial Neural Network models has been conducted. The result of performance comparisons among the models can be useful reference sources for other researches in opinion mining tasks.

Secondly, this research proposed a hybrid model of lexicon based and machine learning. This hybrid model takes advantages of lexicon resources in each language and supplement to enhance the machine learning model. Therefore, it has better performance than that of standalone model. The proposed models can be applied and tested in other opinion mining researches or business domains. In addition, the proposed hybrid model can be scalable for other languages by taking both advantages of existing sentiment resources and machine learning techniques. For a new non-English language, researcher needs to consider its specific characteristics, develop its sentiment resources then it can be added to the model.

Thirdly, this work applied opinion mining on analyzing hotel services preferences. Hotel aspects extraction and opinion summarization scales were proposed for comparing

Aspects are ordered according to relative importance in a descending manner.

N_score P_score

2016 한국경영정보학회 춘계학술대회

-147-

Page 12: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

preferences across cultures. The results show that there are differences in preferences between Vietnamese and American customers. These outcomes can be looked deeper and become implementation guide for hotel managers to improve their services.

Although this study utilizes social media data to build a hybrid opinion mining model in multilingual social media text, it has limitations and future research directions. This study was to develop a binary classification model that distinguishes only two positive and negative polarities of customer comments. The next study should extend the binary class into more classes to express the strength of the sentiment comments. Future research should apply several techniques for opinion mining with more levels of sentiment; an example is to design multi-category scale such as strong positive – positive – neutral – negative – strong negative. This can improve the accuracy of customer opinions in the real world. This study developed hybrid opinion mining model on multilingual social media text but focused on experiments with Vietnamese and English languages. Future research should extend experiments on other languages such as Korean, Chinese and so on.

References

[1] Ahmad K., Cheng D., Almas Y., “Multi-lingual Sentiment Analysis of Financial News Streams,” Proceeding of the Second Workshop on Computational

Approaches to Arabic Script-based Languages, 2007, pp. 1-12.

[2] Baccianella, S., Esuli, A., & Sebastiani, F., "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining,” LREC, Vol. 10, 2010, pp. 2200-2204.

[3] Balahur, A., & Turchi, M., "Multilingual Sentiment Analysis Using Machine Translation?" Proceedings of

the 3rd Workshop in Computational Approaches to

Subjectivity and Sentiment Analysis. Association for Computational Linguistics, 2012, pp. 52-60.

[4] Banea, C., Mihalcea, R., Wiebe, J., and Hassan, S., “Multilingual Subjectivity Analysis using Machine Translation,” Proceedings of the Conference on

Empirical Methods in Natural Language Processing

(EMNLP 2008), 2008, pp. 127-135. [5] Bao, H.T., Tuan, N.A., Son, N.C., "Issues in

Construction of a Vietnamese Machine Tractable Dictionary," First National Symposium on Research,

Development, and Applications of Information and

Communication Technology, 2003, pp. 253-263. [6] Bostanci, B., & Bostanci, E., "An Evaluation of

Classification Algorithms Using Mc Nemar’s Test," Proceedings of Seventh International Conference on

Bio-Inspired Computing: Theories and Applications,

2013, pp. 15-26. [7] Cambria, E., Mazzocco, T., & Hussain, A.,

"Application of Multi-dimensional Scaling and Artificial Neural Networks for Biologically inspired

Opinion Mining," Biologically Inspired Cognitive

Architectures, Vol. 4, 2013, pp. 41-53. [8] Chang, J. Y., Lee, S. Y. and Han, J. B.,

"Machine-Learned Classification Technique for Opinion Documents Retrieval in Social Network Services," Proceedings of the 2013 Korea Computer

Conference, 2013, pp. 245-247. [9] Denecke, K., “Using Sentiwordnet for multilingual

sentiment analysis.” 2008 IEEE 24th International

Conference on Data Engineering Workshop, 2008, pp. 507-512.

[10] Go, A., Bhayani, R., and Huang, L., "Twitter Sentiment Classification Using Distant Supervision," CS224N Project Report, Stanford, Vol. 1, 2009, pp. 1-12.

[11] Ha, Q. T., Vu, T. T., Pham, H. T., & Luu, C. T., "An Upgrading Feature-Based Opinion Mining Model on Vietnamese Product Reviews," Active Media

Technology, Springer Berlin Heidelberg, Vol. 6890, 2011, pp. 173-185.

[12] Hoang, V. C. D., Dinh, D., Le Nguyen, N., & Ngo, H. Q., "A Comparative Study on Vietnamese Text Classification Methods," 2007 IEEE International

Conference on Research, Innovation and Vision for the

Future, 2007, pp. 267-273. [13] Hu, M. and Liu, B., "Mining and Summarizing

Customer Reviews," SIGKDD International

Conference on Knowledge Discovery and Data

Mining, ACM, 2004, pp. 168-177. [14] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C.,

Federico, M., Bertoldi, N. & Dyer, C., "Moses: Open source toolkit for statistical machine translation," Proceedings of the 45th annual meeting of the ACL on

interactive poster and demonstration sessions, Association for Computational Linguistics, 2007, pp. 177-180.

[15] Lanz, L., Fischhof, B., and Lee, R., "How are Hotels Embracing Social Media in 2010? Examples of How to Start Engaging," HVS Sales and Marketing Services, Vol. June, 2010, pp. 1-29.

[16] Le, H. S. and Lee, H. K., "Exploring Relationship Between Social ICT Issues And Academic Research Interests Through Text Mining Analysis," The Journal

of Internet Electronic Commerce Research, Vol. 14, No. 5, 2014, pp. 161-180.

[17] Le, H. S., Lee, J. H. and Lee, H. K., "Purchase Process Aspect-Based Opinion Mining: An Application For Online Shopping Mall," The Journal of Internet

Electronic Commerce Research, Vol. 15, No. 2, 2015, pp. 15-28.

[18] Le, H. S., Lee, J. H. and Lee, H. K., “Design Hybrid Models for Opinion Mining on Vietnamese Social Media Text Data,” The Journal of Internet Electronic

Commerce Research, Vol. 16, No. 2, 2016, pp. 231–255.

[19] Le, H. T., Sam, R. C., & Nguyen, P. T., "Extracting phrases in Vietnamese Document for Summary Generation," 2010 International Conference on Asian

Language Processing (IALP), 2010, pp. 207-210.

2016 한국경영정보학회 춘계학술대회

-148-

Page 13: HOTEL SERVICES PREFERENCES ACROSS CULTURES: A CASE … · data mining, web mining and text mining (Bing & Liu, 2012). 2.2. Opinion mining Sentiment analysis or opinion mining is the

[20] Lee, C. H., & Yang, H. C., “A Multilingual Text Mining Approach Based on Self-Organizing Maps,” Applied Intelligence, Vol. 18, No. 3, 2003, pp. 295-310.

[21] Lee, J. H. and Lee, H. K., "Research on Unstructured Text Mining Algorithm through Data Dictionary based R Programming," Journal of The Korean Industrial

Information Systems Research, Vol. 20, No. 2, 2015, pp. 113-124.

[22] Lee, J. Y., In, K. H. and Kim, U. M., "Customer Analysis in SNS by using the Opinion Mining," 2012

Fall Conference of Korean Institute of Information

Scientists and Engineers, Vol. 39, No. 2, 2012, pp. 101-103.

[23] Liu, B., "Sentiment analysis and subjectivity," Handbook of Natural Language Processing, second edition, 2010.

[24] Liu, B., "Web Data Mining Exploring Hyperlinks, Contents, and Usage Data," second edition, Springer. 2011.

[25] Medhat, W., Hassan, A., and Korashy, H., “Sentiment analysis algorithms and application: A survey,” Ain Shams Engineering Journal, Vol. 5, No. 4, 2014, pp. 1093-1113.

[26] Montejo, R. A., Martínez, C. E., Martín, V. M. T., and Ureña, L. L. A., "Ranked Wordnet Graph for Sentiment Polarity Classification in Twitter," Computer Speech & Language, Vol. 28, No. 1, pp. 93-107.

[27] Nguyen, B. A., & Yang, D. L.,"A Semi-Automatic Approach to Construct Vietnamese Ontology from Online Text, "The International Review of Research in

Open and Distributed Learning, Vol. 13, No. 5, 2012, pp. 148-172.

[28] Nguyen, C. T. and Phan, X. H., "JVnSegmenter: A Java-based Vietnamese Word Segmentation Tool", http://jvnsegmenter.sourceforge.net/, 2007.

[29] Nguyen, C. T., Nguyen, T. K., Phan, X. H., Nguyen, L. M., & Ha, Q. T., "Vietnamese Word Segmentation with CRFs and SVMs: An Investigation," Proceedings

of the 20th Pacific Asia Conference on Language,

Information and Computation, PACLIC, 2006, pp. 215-222.

[30] Nguyen, H. N., Van Le, T., Le, H. S., and Pham, T. V., "Domain Specific Sentiment Dictionary for Opinion Mining of Vietnamese Text," Multi-disciplinary

Trends in Artificial Intelligence, Springer International Publishing, Vol. 8875, 2013, pp. 136-148.

[31] Nguyen, P. H., Ngo, T. D., Phan, D. A., Dinh, T., & Huynh, T. Q., "Vietnamese Spelling Detection and Correction using Bi-gram, Minimum Edit Distance, SoundEx Algorithms with Some Additional Heuristics," IEEE International Conference on Research, Innovation and Vision for the Future, 2008,

pp. 96-102. [32] Ortigosa, A., Martín, J. M., & Carro, R. M.,

"Sentiment Analysis in Facebook and Its Application to E-learning," Computers in Human Behavior, Vol. 31, 2014, pp. 527-541.

[33] Pang, B., & Lee, L., "Opinion mining and sentiment analysis," Foundations and trends in information

retrieval, Vol. 2, No. 1-2, 2008, pp. 1-135. [34] Popescu, A. M. and Etzioni, O., "Extracting Pproduct

Features and Opinions from Reviews," Natural

Language Processing and Text Mining, Springer, 2007, pp. 9-28.

[35] Sharma, R., Nigam, S. and Jain, R., "Polarity detection at sentence level," International Journal of Computer

Applications, Vol. 86, No. 11, 2014, pp. 29-33. [36] Sharma, R., Nigam, S., Jain, R., "Determination of

Polarity of sentences using Sentiment Orientation System," International Journal of Advances in

Computer Science and Technology (IJACST) WARSE, Vol. 3, No. 3, 2014, pp. 182-187.

[37] Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A., “Sentiment strength detection in short informal text,” Journal of the American Society for

Information Science and Technology, Vol. 61, No. 12, 2010, pp. 2544-2558.

[38] Titov, I. and McDonald R., "A Joint Model of Text and Aspect Ratings for Sentiment Summarization," Association for Computational Linguistics, Vol. 8, 2008, pp. 308-316.

[39] Vu, X. S., Song, H. J., & Park, S. B., "Building a Vietnamese SentiWordNet Using Vietnamese Electronic Dictionary and String Kernel," Knowledge

Management and Acquisition for Smart Systems and

Services. Vol. 8863, 2014. pp. 223-235. [40] Wang, T., Cai, Y., Leung, H. F., Raymond, Y. K. L., Li,

Q. and Min, H., "Product Aspect Extraction Supervised with Online Domain Knowledge," Knowledge-Based Systems, Vol. 71, 2014, pp. 86-100.

[41] Wilson, T., Wiebe, J. and Hoffman, P., "Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis", HLT '05 Proceedings of the conference on

Human Language Technology and Empirical Methods

in Natural Language Processing, 2005, pp. 347-354. [42] Wu, H., Shenghua, Z. and Ling, L., "Social Media

Competitive Analysis and Text Mining: A Case Study in the Pizza Industry," International Journal of

Information Management, Vol. 33, 2013, pp. 464– 472.

[43] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., & Zhou, Z. H., "Top 10 algorithms in data mining," Knowledge and information systems, Vol. 14, No. 1, 2008, pp. 1-37.

2016 한국경영정보학회 춘계학술대회

-149-