12
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/263090750 Identifying Emerging Hotel Preference Using Emerging Pattern Mining Technique ARTICLE in TOURISM MANAGEMENT · JUNE 2014 Impact Factor: 2.57 · DOI: 10.1016/j.tourman.2014.06.015 CITATIONS 2 READS 443 5 AUTHORS, INCLUDING: Gang Li Hebei University of Technology 278 PUBLICATIONS 3,547 CITATIONS SEE PROFILE Quan Vu Victoria University Melbourne 22 PUBLICATIONS 111 CITATIONS SEE PROFILE Jia Rong Victoria University Melbourne 16 PUBLICATIONS 185 CITATIONS SEE PROFILE Xinyuan (Roy) Zhao Sun Yat-Sen University 28 PUBLICATIONS 306 CITATIONS SEE PROFILE Available from: Quan Vu Retrieved on: 03 February 2016

Identifying emerging hotel preferences using Emerging Pattern Mining technique

Embed Size (px)

Citation preview

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/263090750

IdentifyingEmergingHotelPreferenceUsingEmergingPatternMiningTechnique

ARTICLEinTOURISMMANAGEMENT·JUNE2014

ImpactFactor:2.57·DOI:10.1016/j.tourman.2014.06.015

CITATIONS

2

READS

443

5AUTHORS,INCLUDING:

GangLi

HebeiUniversityofTechnology

278PUBLICATIONS3,547CITATIONS

SEEPROFILE

QuanVu

VictoriaUniversityMelbourne

22PUBLICATIONS111CITATIONS

SEEPROFILE

JiaRong

VictoriaUniversityMelbourne

16PUBLICATIONS185CITATIONS

SEEPROFILE

Xinyuan(Roy)Zhao

SunYat-SenUniversity

28PUBLICATIONS306CITATIONS

SEEPROFILE

Availablefrom:QuanVu

Retrievedon:03February2016

lable at ScienceDirect

Tourism Management 46 (2015) 311e321

Contents lists avai

Tourism Management

journal homepage: www.elsevier .com/locate/ tourman

Identifying emerging hotel preferences using Emerging PatternMining technique

Gang Li a, 1, Rob Law b, 2, Huy Quan Vu a, 3, Jia Rong a, 4, Xinyuan (Roy) Zhao c, *

a School of Information Technology, Deakin University, Vic 3125, Australiab School of Hotel & Tourism Management, The Hong Kong Polytechnic University, Hong Kong Special Administrative Regionc School of Business, Sun Yat-Sen University, Guangzhou, Guangdong, 510275, China

h i g h l i g h t s

� A novel means for online review analysis identifies features of interest in hotel selection.� Emerging Pattern Mining is utilized to identify those features.� A dataset of 118,000 hotel reviews in Asia Pacific destinations was collected from TripAdvisor.

a r t i c l e i n f o

Article history:Received 11 November 2013Accepted 25 June 2014Available online

Keywords:Hotel preferenceData miningTravel behaviorEmerging pattern miningNatural language processing

* Corresponding author. Tel.: þ86 20 84112721; faxE-mail addresses: [email protected] (G. L

(R. Law), [email protected] (H.Q. Vu), [email protected], [email protected] (X. (Roy)

1 Tel.: þ61 3 9251 7434; fax: þ61 3 9251 7604.2 Tel.: þ852 3400 2181; fax: þ852 2362 9362.3 Tel.: þ61 432 411 359.4 Tel.: þ61 3 925 17711.

http://dx.doi.org/10.1016/j.tourman.2014.06.0150261-5177/© 2014 Elsevier Ltd. All rights reserved.

a b s t r a c t

Hotel managers continue to find ways to understand traveler preferences, with the aim of improvingtheir strategic planning, marketing, and product development. Traveler preference is unpredictable; forexample, hotel guests used to prefer having a telephone in the room, but now favor fast Internetconnection. Changes in preference influence the performance of hotel businesses, thus creating the needto identify and address the demands of their guests. Most existing studies focus on current demandattributes and not on emerging ones. Thus, hotel managers may find it difficult to make appropriatedecisions in response to changes in travelers' concerns. To address these challenges, this paper adoptsEmerging Pattern Mining technique to identify emergent hotel features of interest to internationaltravelers. Data are derived from 118,000 records of online reviews. The methods and findings can helphotel managers gain insights into travelers' interests, enabling the former to gain a better understandingof the rapid changes in tourist preferences.

© 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Most people prioritize accommodation when planning a trip,spending most of their planning time and effort on selecting theright option. Travelers have different expectations and/or prefer-ences, depending on their destination, purpose and mode of travel,as well as previous accommodation experience (Liu, Law, Rong, Li,& Hall, 2013; Liu, Shi, & Hu, 2013). A comprehensive understand-ing of customer requirements can help hotel managers gain a lead

: þ86 20 84036924.i), [email protected] (J. Rong), zhaoxy22@. Zhao).

in the market in terms of strategic planning, marketing, andproduct development (Wilkins, 2010). However, it is difficult toidentify such crucial knowledge due to the complex decision-making process and the wide range of selection criteria (Li et al.,2013).

Of these criteria, the most important has to do with hotel fea-tures (i.e., attributes or factors) that most travelers seriouslyconsider. The most valuable hotel features that significantly affect atraveler's selections include location, price, facilities, and cleanli-ness (Lockyer, 2005). Other features, such as the size and type ofbuilding, quality of service and a quiet environment, are importantto some people (Albaladejo-Pina & Diaz-Delfa, 2009; Merlo & deSouza Joao, 2011). Merlo and de Souza Joao (2011) examine specifichotel features, such as air conditioning in bedrooms. Sohrabi,Vanami, Tahmasebipur, and Fazil (2012) present another list ofimportant hotel features, including promenade, comfort, security,network, pleasure, news, recreational information, expenditure,room facilities, and car parking.

G. Li et al. / Tourism Management 46 (2015) 311e321312

Advances in Internet technology enable travelers to share theirtravel-related experiences, opinions, and concerns on many onlineplatforms (Mack, Blose, & Pan, 2008). Thus, researchers are nowshifting their attention to this data source as a way of miningtraveler preference in a cheap, efficient, and nonintrusive manner.For instance, Stringam and Gerdes (2010) use a corpus-basedapproach to analyze guest comments on online hotel distributionsites as well as to identify frequently used words, patterns of wordusage, and their relationship to hotel features rating. Furthermore,descriptive statistical data are used to assess the importance andeffect on ratings of several features, including location, size of guestrooms, staff, facilities, and breakfast offerings (Stringam, Gerdes, &Vanleeuwen, 2010). Chaves, Gomes, and Pedron (2012) show thatroom, staff, location, cleanliness, friendliness, and helpfulness arethe most frequently used words in online reviews of small andmedium hotels in Portugal. Liu, Law, et al. (2013), Liu, Shi, et al.(2013) analyze comments collated from TripAdvisor.com andchanges in hotel customers' expectations according to travel mode,using the association rule mining technique. Li et al. (2013) utilizethe Choquet integral, a method of fuzzy decision support, toanalyze the selection preferences of different groups of travelers interms of several hotel features.

Researchers have yet to meet the increasing demand of hotelmanagers for more accurate knowledge on the hotel preferences oftravelers. Several limitations that prevent researchers from iden-tifying such knowledge are listed below.

Identifying Emerging Features

Traveler preference is unpredictable and dynamic. For example,travelers once preferred having a telephone in their room. Duringthat time, charging for telephone usage used to be a significantsource of revenue, but usage has declined to a point whereininvesting in this facility resulted in losses for many hotels that offerthis facility (Huettel, 2010). Today, hotels gain significant customersatisfaction by offering free wireless Internet (Bulchand-Gidumal,Melian-Gonzalez, & Lopez-Valcarcel, 2011). These changes intravelers' concerns can affect the performance of hotel businesses.As such, managers must effectively identify features that arebecoming important to travelers. However, efforts to address thisissue have been limited.

User Identification for Feature Improvement

Different types of travelers have different expectations of hotelfeatures (Liu, Law, et al., 2013; Liu, Shi, et al., 2013). Some aspectsmay be important to all travelers, whereas others may be signifi-cant only to a specific subgroup. A clear picture of such differencescould benefit hotel managers. For instance, if travelers fromWestern countries prefer clubbing facilities, managers can designappropriate business solutions to improve those features and meetthe specific expectations of this group. However, this aspect hasreceived little research attention.

The identification of emerging features is different from tradi-tional approaches to hotel feature analysis, because analysts haveno prior knowledge on what features should be included in thestudy. Large data samples are also required to identify emergingchanges in customer preference patterns. Traditional researchmethods, such as surveys, opinion polls or focus groups, are inad-equate. Therefore, resorting to available online data, such as onlinereviews generally expressed as textual comments, is necessary.These reviews contain abundant information on user opinions,experiences, or concerns, and are considered potential goldminesfromwhich tourism researchers can gain insights into the behaviorof travelers (Pan, MacLaurin, & Crotts, 2007).

The analysis of hotel features treats each feature as an item, anda set of hotel features associated with a traveler is an item set.Identifying emerging changes in traveler response to such featuresis typically formulated as a problem of Emerging Pattern Mining(EPM). Originally proposed by Dong and Li (1999), EPM can captureemerging trends in time-stamped databases or sharp contrastsbetween data sets or groups. This technique is mainly applied inbioinformatics (Li, Liu, Downing, Yeoh, & Wong, 2003; Li & Wong,2002) and remains an active topic in computer science (Li &Yang, 2007; Yu, Chen, & Tseng, 2011). By using EPM, researcherscan identify emerging hotel features.

The current study aims to fill the current research gap byintroducing the EPM technique to establish emerging hotel fea-tures. The term “hotel features” includes any entity or concept thatconcerns travelers when reviewing a hotel. In our case study, wefirst construct a comprehensive list of candidate hotel features froma large collection of text-based online reviews (N z 118,000). Weuse the EPM technique to identify emerging features that currentlyreceive more attention from international travelers. We alsoconstruct a set of user profiles to assist hotel managers inimproving the features available in their properties. The methodand the findings of this study are potentially valuable to hotelmanagers who want to gain insights into travelers' concerns andfind ways to adapt to rapid changes in the tourism market.

The rest of the paper is organized into sections. Section 2summarizes the methods used and attempts made to analyze ho-tel features. Section 3 presents the review framework used forcreating a hotel features list from textual comments, and a detaileddescription of EPM concepts used to identify emerging features.Section 4 demonstrates the effectiveness of the proposed methodin a case study. Finally, Section 5 concludes our study and offerssuggestions for future research directions.

2. Related work

This section reviews existing studies that utilize hotel featuresto explore traveler preferences. We also present a critical analysis ofthe limitations of these studies and our research objectives.

2.1. Hotel features analysis

Several studies have analyzed hotel features to acquire knowl-edge on traveler preference. A popular method for data collection isusing survey questionnaires, wherein hotel features are repre-sented by short-answer questions or a set of keywords. Table 1summarizes the hotel features included in traditional studies.

Due to the increasing interaction among travelers, studies areincreasingly utilizing observation data collected from online re-sources (e.g., blogs, travel websites, and social media) throughonline reviews. Traditional, statistics-based data analysis modelsare ineffective in extracting information from text-based reviewsand comments. Thus, new techniques and approaches have beenproposed. For instance, manual content analysis is used to studytraveler characteristics and communications about Australia as atourism destination (Carson, 2008; Wenger, 2008). Another studyemploys the narrative structure analysis to identify key marketingelements, including characterization, space categorization, andevaluation of the product experience (Tussyadiah & Fesenmaier,2008). Manual methods are time consuming and incapable ofobtaining the overall differences in traveler preferences. Althoughautomated approaches, such as corpus-based semantic analysis(Rayson & Garside, 2000) and stance-shift analysis (Davidson &Skinner, 2010), are also employed, such methods require users tohave a background in linguistics and access to expensive software(Capriello, Mason, Davis, & Crotts, 2013). Thus, tourism researchers

Table 1Predefined hotel features used in existing studies.

Hotel features Related work

Cleanliness, Location, Room, Service, Sleep Quality,Value

(Liu, Law, et al., 2013;Liu, Shi, et al., 2013)

Shabby Bed, Clean Rats, Friendly Staff, LimitedParking, Good Room

(Bjorkelund, Burnett,& Norvag, 2012)

Personalization, Warm Welcome, SpecialRelationship, Straight from the Heart, Comfort

(Ariffin & Maghzi, 2012)

Promenade and Comfort, Security and Protection,Network Services, Pleasure, Hotel Staff and TheirServices, News and Recreational Information,Cleanliness and Room Comfort, Expenditure,Room Facilities, Parking

(Sohrabi et al., 2012)

Location, Size and Diversity, Characteristics of theLobby, Characteristics of the Rooms, Parking

(Merlo &de Souza Joao, 2011)

Type of Building, Location, Number of Bedrooms,Price per Room, Horses for Hire, Play Area, MealService, Swimming Pool, Sports Facilities, Mini-Farm, Bathroom, Type of Rent, ‘Q’ Quality Award,Booking

(Albaladejo-Pina &Diaz-Delfa, 2009)

Problem-Solving Abilities by Service Personnel,Price Level, Sanitary Hot Spring Environment,Convenience of Traffic Route/Shuttle, SpecialPromotions, Convenience of ReservationProcedure, Food and Beverages Service

(Hsieh, Lin, & Lin, 2008)

Location, Price, Facilities, Cleanliness (Lockyer, 2005)Staff Service Quality, Room Quality, General

Amenities, Business Services, Value, Security, IDDFacilities

(Choi & Chu, 2001)

Cleanliness, Location, Room Rate, Security, ServiceQuality, Reputation of Hotel

(Ananth, DeMicco,Moreo, & Howey, 1992)

G. Li et al. / Tourism Management 46 (2015) 311e321 313

are also unable to take advantage of such methods for efficientanalysis of online reviews.

In studying traveler preference, Lento, Park, Park, and Lehto(2007) investigates online product and service reviews to identifycritical and prominent (dies) satisfaction factors for virtual travelagencies. Tsou (2010) identifies the geographic information oftravelers embedded on tourism web pages. Bulchand-Gidumalet al. (2011) use online reviews to verify a hypothesis, whichstates that free wireless Internet connection can improve travelersatisfaction. Ye, Law, Li, and Li (2011) analyze online reviews toextract features about travel destinations of Chinese customers.Bosangit, Dulnuan, and Mena (2012) use travel blogs to examinethe post-condition behavior of tourists, while Banyai (2012) analyzethe blog content of travelers to Stratford, Canada to identify populartopics and tourist perceptions of the destination.

2.2. EPM applications

The original purpose of EPM was to capture emerging trends intime-stamped databases or to explore differentiating characteris-tics between groups of data (Dong & Li, 1999). The research onemerging patterns focuses on the use of the discovered patterns forclassification purposes, such as in works on emerging patterns (Li,Dong, & Ramamohanarao, 2000) and jumping emerging patterns(Li, Dong, & Ramamohanarao, 2001). Advanced classificationtechniques used for emerging patterns are proposed based onBayesian approaches (Fan & Ramamohanarao, 2003), and baggingmethods (Fan, Fan, Ramamohanarao, & Liu, 2006). The develop-ment of EPM techniques remains an ongoing effort (Gan & Dai,2009; Liu, Law, et al., 2013; Liu, Shi, et al., 2013; Liu et al., 2014;Yu et al., 2011).

Emerging patterns are mainly applied in bioinformatics. Forinstance, Li andWong (2002) attempt to find groups of genes usingEPM and apply these on a colon tumor data set. Li et al. (2003)develop an interpretable classifier on an acute lymphoblastic leu-kemia microarray data set. Wang, Zhao, Zhao, Wang, and Qiao

(2010) adopt the EMP technique to mine local conserved clustersfrom gene expression data. Park, Lee, and Park (2010) apply in-cremental EPM module on ECG signal data for automatic diagnosisof cardiovascular diseases. Sherhod, Gillet, Judson, and Vessey(2012) utilize the jumping EPM to develop a method for auto-matic toxicity alert, while Huang, Gan, Lu, and Huan (2013) use thistechnique in mining the changes of medical behavior for clinicalpathways of bronchial lung cancer.

Studies have also shown the importance of emerging patterns inapplications in other areas. Kim, Song, and Kim (2005), for example,adopts the emerging pattern concepts in developing the method-ology to detect changes in customer segments between time-stamped data sets. Tsai and Shieh (2009) explore the emergingsequential patterns to identify the trends in consumer behavior.Shie, Yu, and Tseng (2013) focus on discovering user behaviorpatterns from mobile commerce environments using mobilesequential pattern mining. However, although these studies haveexplored the capabilities of EPM as a technique, its strong potentialhas yet to be fully utilized. Moreover, the concept of emergingpatterns has yet to be used in the tourism and hospitalityindustries.

2.3. Problem definition and research objective

The majority of existing studies have focused on identifying andanalyzing the most valuable hotel features that significantly affectthe hotel selection process for travelers. These studies examineseveral commonlymentioned hotel features, such as price, location,room quality, staff, and service (Chaves et al., 2012; Liu, Law, et al.,2013; Liu, Shi, et al., 2013; Stringam et al., 2010). However, thenatural assumption that popular or frequently mentioned hotelfeatures are worth studying is one limitation of these studies.Infrequently used hotel features may also be interesting despitebeing mentioned less because these features are new and havealready gained increasing interest from travelers. The wirelessInternet facility is one such feature (Bulchand-Gidumal et al., 2011).Wireless technology has only become widely used in recent years,withmanywireless Internet-enabled devices available to users. Thedemand for wireless Internet is growing significantly, even thoughthe traditional hotel features remain more popular. In relation tothis, most studies have not focused on discovering hotel featuresthat are emerging as important to travelers.

Identifying target users for hotel features is important fortourism managers, as they help the latter improve strategies thatwill meet traveler expectations. Existing studies have focused onidentifying features that are frequently mentioned in online re-views (Chaves et al., 2012; Stringam& Gerdes, 2010). This approachis ineffective due to the challenges involved in enhancing popularfeatures that can impress travelers. A feasible solution that ensuresuser satisfaction is difficult to develop. These features are generallystandardized and managers are well aware of them, therebylimiting the scope for market competition. In contrast, some fea-tures concern certain groups of travelers. Specific improvementplans must be developed to address these concerns. Identifyingrelevant features and their target users can provide hotel managersthe knowledge they need to create more competitive hotel prod-ucts. This aspect has not been studied, and in the hotel featuresanalysis of the current study, such feature is referred to as “featuresof specific interest”.

This study uses the EPM concept to detect changes in travelers'concerns and to demonstrate its effectiveness through an analysisof emerging hotel features.We also identify a number of specificallyinterested features and their target users to develop appropriatebusiness solutions for improving hotel services.

Fig. 1. Hotel feature identification process.

G. Li et al. / Tourism Management 46 (2015) 311e321314

3. Methodology

Several challenges must be addressed to analyze the changes intravelers' concern on hotel features using online reviews. Table 1shows how researchers label hotel features differently despiteshared similarities. Selecting keywords that appropriately andaccurately describe hotel characteristics is a challenging task. Apossible solution would be to incorporate text-mining techniquesinto the analysis of online reviews. This approach can extract usefulknowledge from unstructured text and then transform the infor-mation into structured data for analysis, thereby revealing re-lationships, patterns, or trends from textual data (Singh, Hu, &Roehi, 2007). Statistical tests are commonly used in studies thatexamine changes and trends in traveler preference (see, e.g.,Bulchand-Gidumal et al., 2011). However, when a large number ofvariables are considered, such analysis on every available featurebecomes inefficient and costly. Therefore, we present a text-processing framework that can automatically identify hotel fea-tures from review comments. Next, we introduce the EPM conceptand describe how it can be used to discover emerging hotelfeatures.

3.1. Review processing

We employ General Architecture for Text Engineering (GATE),one of the most powerful software packages capable of solving

most text-processing problems (http://gate.ac.uk/). This softwarehas been used in a wide range of applications, such as miningsequence information from protein structure databases in bioin-formatics (Witte & Baker, 2005) as well as extracting and miningpatient information from free-text clinical records in healthcare(Tseng, Lin, & Lin, 2007; Zhou & Han, 2006). In tourism applica-tions, GATE has been used in building Tourist Face, a contentssystem based on the concept of a freebase that provides access tocultural-tourist information (Munoz Gil et al., 2011), in developinga tourism recommender system (Varga & Groza, 2011), and othertourism applications (Ruiz-Martinez, Minarro-Gimenez, Castella-nos-Nieves, Garcia-Saanchez, & Valencia-Garcia, 2011).

An advantage of this tool is the availability of several languagedatabases, especially the English lexicon. This software contains acomprehensive list of English vocabulary terms that can be used toidentify hotel features from online reviews. Suppose that we collecta data set, R, with m review comments, R ¼ {r1, r2, … ,rm.}.; Theprocess of identifying the hotel features mentioned in the reviews,ri2R , is performed in two major stages, as shown in Fig. 1.

3.1.1. Text processingThis step transforms the unstructured textual material into a

more useful data format. In more detail, each review, ri, is firstloaded into a text tokenizing algorithm, wherein the stream of textis broken into words, phrases, symbols, or other meaningful ele-ments called “tokens”. The token for each review is passed through

G. Li et al. / Tourism Management 46 (2015) 311e321 315

a text filter, wherein capital letters are normalized to lower case.Tokens containing symbols or numbers are removed, because theyare considered irrelevant to feature analysis. The remaining tokensare encoded into a stemming process to reduce inflected words totheir stem, base, or root form. For instance, a stemming algorithmcan reduce the words “cleans”, “cleaning”, “cleanliness”, and“cleaned” to the root word “clean”. This process allows the user toextract the features that have been mentioned using different wordforms. A stemmed token list, SðiÞ ¼ fs1ðiÞ; s2ðiÞ;…g, is constructed for

GrowthRate

0BBBBB@X;Gi;Gj

1CCCCCA

¼

8>>>>><>>>>>:

0; if supp ðX; GiÞ ¼ 0 and supp�X; Gj

� ¼ 0

∞; if supp ðX; GiÞ ¼ 0 and supp�X; Gj

�s0

supp ðX; GiÞsupp

�X; Gj

�; otherwise

: (3.2)

each review, ri, i ¼ 1, ..., m, and saved to a processed-documentdatabase. In hotel feature identification, a natural assumption isthat the English vocabulary of noun types is commonly used torefer to entities, such as hotel features (e.g., room location, view,service, and staff). Therefore, we identify and construct a list of thestemmed nouns, N ¼ {n1, n2, …, no}, which appear in the reviewcorpus using the English lexicon of GATE. The lexicon resource isused as a lookup database, which contains approximately 63,000commonly used English words. Each word is accompanied by a setof tags that helps determine the word type, such as noun, verb, oradjective. The list of nouns is used to identify candidate hotel fea-tures in the next stage.

3.1.2. Hotel feature candidate identificationIn this step, we select interesting nouns, nj2N, for further

analysis as potential hotel features. Specifically, a binary vector,vðiÞ ¼ fv1ðiÞ; v2ðiÞ;…; vo

ðiÞg, is constructed for each stemmed tokenlist, S(i), where vjðiÞ takes the value of 1 if nj2SðiÞ, or 0 otherwise. Thedegree of interest of each noun, nj2N, is evaluated by a supportvalue given by

supp�nj� ¼ count

�nj�

jRj ; (3.1)

where count (nj) is the count of vector, v(i), whose values,vjðiÞ ¼ 1; c i ¼ 1; ::;m. jRj is the total number of records in the data

set R. We use a user-specified minimum support or the supportthreshold, (ds), to measure the significance of the nouns in the re-viewcorpus. If a noun, nj, satisfies supp (nj)� ds, that noun is selectedinto the hotel feature candidate list; otherwise, it is removed.

The advantage of this method is that users do not need to pro-vide a set of predefined keywords to identify and extract hotelfeatures for analysis. Instead, a list of candidates is automaticallyconstructed from the review comments. All potential featuresmentioned in the reviews are considered, and interesting candi-dates are returned. The support threshold, ds, is set to eliminateinsignificant features while retaining potentially interesting onesfor subsequent analysis.

3.2. EPM

Efficiently discovering changes or trends in travelers' concerns isa challenging task for hotel managers, given the large number offeatures mentioned in online reviews. This section introduces the

EPM concept to address this issue. Emerging patterns are defined asitem sets, whose support increases significantly from one data setto another (Dong & Li, 1999).

Let F ¼ {f1, f2, …, fo} be a set of items. Subset X4F is called a k-item set, where k ¼ jXj. Given a number of groups, {G1, G2, …}, thesupport for an item set, X3Gi, is denoted as supp (X, Gi), whichreflects how frequently X appears in this group. The change in thesupport for X from a group Gi to a group Gj is measured by a growthrate metric defined as

Given de > 1 as a growth rate threshold, an item set, X, is calledan emerging pattern if it satisfies the condition given by:

maxði; jÞ�GrowthRate

�X; Gi; Gj

�� � de; (3.3)

where max(i, j) mean that groups Gi and Gj can be of any order, butonly the order with the largest growth rate is compared with thegrowth rate threshold. When GrowthRate (X, Gi, Gj) ¼∞, X is calleda jumping emerging pattern, because it appears in one group, butnot in the other.

Several issues must be considered when using EPM in hotelfeatures analysis. Interpreting a hotel feature on its owndratherthan as part of an item set containing many othersdis easier andmore meaningful. When the number of features is large, the use ofitem set X with one hotel feature, (k ¼ 1), is suggested. Equation(3.2) does not take into account the order of groups Gi and Gj,although it is important in detecting changes in travelers' concernsover time. For example, we group the hotel reviews according to theyears, G1 contains reviews created in year 2012, and G2 containsreviews created in 2013. An increase in the rate of change for a hotelfeature from 2012 to 2013, if G1 is considered before G2, wouldmean something different from a decrease if G2 is considered beforeG1. Therefore, we define the concepts of positive and negativeemerging patterns to distinguish between increasing anddecreasing amounts of change in the concerns expressed bytravelers.

Let Gi be an initial group, and Gj a target group, (i s j). An itemset X is a positive emerging pattern if it satisfies Equation (3.3) andsupp (X, Gi) < supp (X, Gj). The growth rate, GrowthRate (X, Gi,Gj) ¼ supp (X,Gj)/supp(X, Gi), is termed a positive growth rate. Onthe contrary, X is a negative emerging pattern if it satisfies Equation(3.3) and supp (X, Gi) > supp (X, Gj). The growthrate,:GrowthRate ðX; Gi; GjÞ ¼ supp ðX; GiÞ=supp ðX; GjÞ, istermed a negative growth rate and indicated by a negative sign ð:Þ.Below is an example that illustrates the use of EPM in the hotelcontext.

Example 1. A set of hotel features is defined to include price,room, service, telephone, and wireless Internet connection. Next, asample data set is constructed (Table 2). Each record is representedas a vector, v(i), where each element, vjðiÞ, takes a value of 1 if itscorresponding feature is mentioned in review ri and 0 otherwise.The year attribute represents the dates of the reviews, and indicatesthe group to which a record belongs, namely, 2012 or 2013. The

Table 2A sample data set.

ID Price Room Service Telephone Wireless Year

r1 1 1 1 1 0 2012r2 1 0 0 1 0 2012r3 1 1 0 0 0 2012r4 1 1 1 1 0 2012r5 1 1 0 0 0 2012r6 1 0 1 0 1 2013r7 1 1 1 0 0 2013r8 1 1 0 1 1 2013r9 1 1 1 0 0 2013r10 1 1 1 0 1 2013

G. Li et al. / Tourism Management 46 (2015) 311e321316

review is grouped according to years and their support for eachhotel feature is computed. Growth rates are also computed toreflect changes in traveler concerns (Table 3).

In this example, we set the emerging threshold, de ¼ 2. Table 3shows the results, namely, the features that can be interpreted asan emerging increase in interest on the hotel service from 2012 to2013, as indicated by the growth rate of 2.0. As can be seen, roomtelephone has received significantly less attention from travelersover time, as indicated by a negative growth rate of :3.0. A jumpingemerging pattern is shown for wireless facility in 2013, with agrowth rate of ∞. Price and room are not emerging features,because their growth rates are less than the emerging threshold.

The hotel features, such as telephone and wireless connection,are less mentioned in the review, i.e., these features do not receivemuch attention in the traditional approach because the studiesfocus on the popular features. EPM targets any major change in thesupport values between groups rather than the values themselves.Thus, EPM can effectively identify the interesting changes in thetraveler's attention. Standard chi-square (c2) tests can also beapplied to verify the significance of group differences in real-lifeapplications.

3.3. Summary

The overall analysis flow for hotel features is presented in thissection. The review processing method is applied to an online re-view corpus to construct a set of features. EPM is then applied toidentify the emerging hotel features in the list. As discussed inSection 2.2, hotel managers are also interested in specifically tar-geted features, to enable them to create products that are morecompetitive. We group reviews that mention features into a subsetand then perform segmentation analysis. We examine the pro-portion of reviews that identify and report the features of specificinterest in terms of the demographic characteristics of travelers. Forinstance, the users are grouped according to their travel mode,namely, business, couple, family, friend, and solo. In relation to this,a hotel manager must be able to identify facilities, such as theInternet and telephones, as most interesting to business travelers.Given a feature, fi, whose users are segmented into groups ac-cording to the proportion of fG1

ðiÞ;G2ðiÞ;…;Gn

ðiÞg.; fi is identified asa feature of specific interest if max n

i fGðiÞj g � l, where l is a

Table 3Supports and growth rates of hotel features.

Features Support Growth rate

Year 2012 Year 2013

Price 1.0 1.0 0Room 0.8 0.8 0Service 0.4 0.8 2.0Telephone 0.4 0.6 :3.0Wireless 0.0 0.3 ∞

specifically interested threshold with a constraint, l > 100/n. Here,Gj

ðiÞ is the proportion of any group j ¼ 1,…, n for feature fi, and onlythe group with the largest proportion is compared against theconstraint, l. The effectiveness of the proposed method is demon-strated in a case study reported in the subsequent section.

4. Experiment and analysis

This chapter first describes our experimental data set, which wecollected from online hotel reviews. The chapter also describes theexperimental design and analysis, presents a summary, and ana-lyzes themanagerial implications. Finally, it presents suggestions tohotel managers to help them improve their products and services.In turn, these will enable the managers to better meet the expec-tations of international travelers.

4.1. Data collection and experimental design

We collect the data set used in this paper from TripAdvisor(www.tripadvisor.com), one of the most popular travel reviewwebsites. This website has been widely used as a data resourcefor research on hotel selection criteria (Liu, Law, et al., 2013; Liu,Shi, et al., 2013; Li et al., 2013). We use the professional dataextraction software, Visual Web Ripper (www.visualwebriper.com), to extract review content. We focus the data extractionprocess on reviews of hotels located in Hong Kong, Singapore,Shanghai, Bangkok and Sydney, because these countries aremajor Asia Pacific international tourist destinations. A total of1740 hotels are included for extraction, ranging from one- tofive-star hotels based on TripAdvisor's ratings. The softwarenavigates each travel review and extracts its text and date ofposting, together with demographic data about the traveler, suchas travel mode (i.e., business, couple, family, friends, or solo) andcountry of origin.

In tourism research, culture is a well-known and importantfactor that influences behavior and decision-making (Reisinger &Turner, 1997; Tsai, Yeung, & Yim, 2011), particularly in hotel eval-uation (Leung, Lee, & Law, 2011). International travelers fromdifferent continents are also likely to have different backgrounds(Khadaroo & Seetanah, 2008). Therefore, we group reviewer loca-tions according to their continent of origin, which is convenient forthe analysis. We note that the majority of reviewers in our data setcome from North America, Europe, Asia and Oceania, so we onlyconsidered these continents in this study. Most of the collectedreviews were posted in recent years (i.e., 2010e2013), with a fewposted in 2009 or earlier. For convenience, we refer to the latterreviews as part of the 2009 group. We remove records withmissingattributes, decreasing the data set to 118,300 records. A detaileddescription of the data set is given in Table 4.

Our case study is organized into several groups to address theresearch objectives.

4.1.1. Hotel feature list constructionThe effectiveness of the proposed text processing method in

automatically identifying hotel features from text reviews is firstdemonstrated. Given that hotel managers are interested in trav-elers' current concerns, we only perform this experiment on thereviews posted in 2013 (i.e., 31,639 records). From this, a list ofcritical hotel features is constructed.

4.1.2. Emerging hotel feature identificationGaining insights into the behavioral tendencies of customers is

important for hotel managers, as they can use such information inbusiness planning and decision-making. We first identify hotelfeatures that have been subject to changes in traveler's attention

Fig. 2. Output hotel feature candidates.

Table 4Description of the collected data set.

Attribute Description Value Percentage (%)

COMMENT Text comment of each hotelreview

Review text 100

YEAR Year when the reviews wereposted online

2013 26.742012 39.852011 17.502010 7.52009 and before 8.39

ORIGIN Location of travelers accordingto continents of origin

Asia 31.26Europe 16.71North America 21.20Oceania 30.03

GROUP Travel mode of travelers Business 24.73Couple 35.91Family 19.36Friends 10.66Solo 9.34

G. Li et al. / Tourism Management 46 (2015) 311e321 317

over the past five years (from 2009 to 2013). The reviews are thengrouped according to the year when they are created. We applyEPM to these groups to detect emerging hotel features. Theemerging threshold is set to de¼ 1.1. A small de value is used becauseour data set contains many records. A slight change in the growthrate indicates a large change in the record number. The c2 test isused to validate the significance of the changes detected by the EPMalgorithm. We then compute the growth rate for the emergingfeatures identified for each pair of the years 2009e2013 to identifythe trends.

4.1.3. Specifically interested features identificationThis case study also identifies features of specific interest that

are relevant to a particular group of travelers. Given that travelmode and origin influence traveler preference, we incorporate theattributes ORIGIN and GROUP in Table 4 to construct user profiles.The reviews are then grouped according to the user demographicprofiles, and a segmentation analysis is applied as described inSection 3.3. The l value is set to 30%. Reviews from years 2012 and2013 are used in this analysis because they represent the mostupdated data.

4.2. Results and analysis

4.2.1. Hotel feature list constructionHere, we apply the proposed review processing method to the

reviews posted in 2013. A stemmed noun list is constructed basedon the English lexicon. The support threshold ds is used to deter-mine whether a noun in the list is interesting enough for furtheranalysis. We first examine the effect of setting different supportthresholds between 0 and 0.1 to the number of hotel featuresreturned as candidates.

Fig. 2 shows that the algorithm returns 5523 features withds ¼ 0, which is the total number of nouns in the stemmed list. Thehotel feature candidate number drops gradually to 419 when ds isset to 0.01, then decreases slightly with higher support thresholds.When ds ¼ 0.1, only 47 feature candidates are returned. Notably, thecandidate generation is an automated process; thus, users mayexamine the output further to select their features of interest. Thisapproach should be feasible in practice because the candidatenumber is usually small. The advantage of this work is that hotelfeatures of interest are identified directly from reviews rather thanfrom a predefined set. Hence, this condition allows for a morecomprehensive and relevant list of hotel features to be constructed.

Next, we select ds ¼ 0.05 to generate a list of hotel featurecandidates for further analysis in our case study. This value is

commonly used for evaluating the extent to which items in a dataset are deemed interesting (Law, Rong, Vu, Li, & Lee, 2011; Li, Law,Rong, & Vu, 2010). This analysis resulted in 111 candidates. Weinspect the list and find that several popular hotel features areincluded (Fig. 3), including room, staff, location, breakfast, service,and cleanliness. This result is reasonably consistent with previousstudies (Chaves et al., 2012; Stringam & Gerdes, 2010), which in-dicates the effectiveness of our approach.

Several other features of interest to travelers are also identified.These detailed hotel aspects include the lobby, lounge, door, coffee,tea, or surrounding environment such as roads, streets, parks,rivers, and spaces. Factors such as stations, airports, taxis, trains,and access are particularly significant to reviewers. Their supportvalues are similar to or higher than some of the popular features,such as the Internet, clubs, receptions, prices, and bars. These re-sults are interesting because these terms are about transportation,which is not directly related to the hotel domain. However, they areimportant in the hotel evaluation process of international travelers.Prior works on hotel features has focused mainly on popular hotelfeatures and less on such ancillary aspects. We use the list of 39features (Fig. 3) for further analysis.

4.2.2. Identification of emerging hotel featuresNext, we identify features where levels of interest among trav-

elers have changed over the past five years. The year 2009 is set asthe initial group, whereas 2013 is set as the target group. Our datasets for these groups are encoded into the EPM algorithm. Onlyfeatures whose growth rates are greater than the positive growthrate with a threshold de ¼ 1.1 are considered. If a feature has asmaller 2009 group support value than that of the 2013 group, thenit is selected as a positive emerging pattern; otherwise, it is selectedas a negative emerging pattern and denoted with a negative signð:Þ in front of its growth rate value. This approach produces 16emerging hotel features (Table 5).

Table 5 shows some interesting patterns in traveler's concernsas reflected in their reviews. The growth rate metric is the factor ofinterest for detecting change in the EPM context. The c2 test resultsof p-value � 0.05 demonstrate statistically demonstrate significantchanges in the support values for 2009 and 2013. Some interestingfindings for the positive emerging pattern are presented here. First,international travelers focus more on the clubbing feature, asshown in the growth rate of 1.994 (Pattern P1); specifically, thenumber of travelers using this facility has nearly doubled since2009. Some hotel areas, such as the lounge and pool, have also

Fig. 3. Identified popular hotel features.

G. Li et al. / Tourism Management 46 (2015) 311e321318

received more attention (P2 and P4, respectively). Rivers and viewsappear to attract more interest from travelers in 2013, with growthrates of 1.115 and 1.105 (P3 and P7, respectively). Pattern P5 showsthat travelers are now more concerned about service. Slightlyincreased attention to dinner and food provided by hotels was alsoidentified.

These positive patterns can focus the attention of hotel man-agers to features that are becoming “hot” among internationaltravelers. Subsequently, managers can make appropriate changesto attract more customers. In comparison, negative emerging pat-terns can help hotel managers in terms of saving effort andconcentrating investment resources away from areas that are un-important to customers. Some popular features such as price andcleanliness are of less concern over time, as indicated by thenegative growth rates in P9 and P14. Interest in indoor facilities [e.g.,bathrooms and beds (P12 and P15, respectively)], or outdoor factors[e.g., streets, taxis, and parks (P10, P11, and P13, respectively)] havealso decreased.

Given these emerging features, hotel managers need to knowwhether trends exist in the areas that travelers focus on for plan-ning purposes. Thus, we also compute the growth rate for eachfeature against each pair of years from 2009 to 2013. This process isperformed with the same procedure as that used in the previous

Table 5Identified emerging hotel features.

Features Support value Growth rate c2 p-value Pattern ID

2009 2013

Club 0.039 0.077 1.994 176.96 0.000 P1Lounge 0.065 0.089 1.383 60.43 0.000 P2River 0.039 0.053 1.351 30.17 0.000 P3Pool 0.186 0.216 1.159 39.94 0.000 P4Service 0.306 0.353 1.155 75.39 0.000 P5Dinner 0.050 0.058 1.150 8.16 0.004 P6Food 0.207 0.230 1.109 22.30 0.000 P7View 0.176 0.194 1.105 16.79 0.000 P8Price 0.185 0.122 :1.520 256.43 0.000 P9Street 0.143 0.096 :1.495 177.57 0.000 P10Taxi 0.133 0.094 :1.413 123.32 0.000 P11Bathroom 0.245 0.176 :1.396 235.54 0.000 P12Park 0.078 0.056 :1.383 60.66 0.000 P13Clean 0.371 0.293 :1.265 212.05 0.000 P14Bed 0.224 0.190 :1.174 52.13 0.000 P15Location 0.479 0.426 :1.125 86.84 0.000 P16

case. The results are shown in Table 6. Here, the signs of the growthrates, rather than the values, are of interest.

A feature has an increasing trend (b) if the growth rates for allyear pairs show a positive trend, and if the growth rates are allnegative, then a feature has a decreasing trend (a). If no clear trendis found, then the symbol “ ” is used. We summarize the findingsfor the emerging trends here. The trend in relation to clubs andfood is increasing, as indicated by the positive growth rates acrossdifferent year pairs (T1 and T7, respectively). Clear downward trendsare found for the most negative emerging hotel features from T9 toT16, as shown by the negative growth rates over time. A slight dropfor lounges, dinners, and views from 2009 to 2010 can be observed.However, they have gained more attention since then.

4.2.3. Identification of specifically interested featuresApproximately 39 subsets from the data are generated for each

of the features on the list constructed earlier to identify specificallyinterested features. Reviews posted in 2012 and 2013 are consid-ered because these represent the most recent data. Features whosesegmentations satisfy maxni fGj

ðiÞg � l are selected because they areof special interest to a particular group of travelers.

Next, we perform a segmentation analysis of these features ac-cording to traveler origin, resulting in the identification of six fea-tures (Fig. 4). Asian and Oceanian travelers are most concerned withbed quality at 67%. Lobby areas of hotels receive particular interestfrom Asians. Reception areas are the top concern to Oceanian trav-elers, whereas very few North American travelers focus on thisfeature. Oceanian travelers comprise the major group that worriesthe most about noise (42%), parking (37%), and dinner (35%).

We then perform segmentation according to travel mode, andidentified six features (Fig. 5). Couples focus more on outdoorfeatures, including rivers and views, as well as some indoor aspects,such as dinner, tea, and bed. Internet facilities receive high levels ofattention from business travelers (36%) and couples (28%).

5. Discussion and managerial implications

Section 4.2 describes a list of 39 hotel features, which arecurrently of concern to international travelers. The features iden-tified are in line with those emerging from previous studies, whichdemonstrate the effectiveness of our proposed method. Severaloutdoor (e.g., streets, parks, and rivers) and transportation features

Table 6Emerging trends of hotel features.

Features Growth rate Trend c2 p-value Trend ID

2009vs.2010 2010vs.2011 2011vs.2012 2012vs.2013

Club 1.111 1.434 1.205 1.039 b 309.02 0.000 T1Lounge :1.047 1.196 1.221 :1.009 168.80 0.000 T2

River :1.040 1.050 1.142 1.171 78.69 0.000 T3

Pool :1.014 :1.003 1.134 1.039 126.37 0.000 T4

Service 1.072 1.007 1.074 :1.003 126.65 0.000 T5

Dinner :1.151 1.139 1.138 1.020 43.72 0.004 T6

Food 1.041 1.012 1.049 1.003 b 36.40 0.000 T7

View :1.142 1.064 1.103 1.076 122.16 0.000 T8

Price :1.152 :1.111 :1.091 :1.090 a 311.90 0.000 T9Street :1.145 :1.117 :1.097 :1.066 a 224.38 0.000 T10Taxi :1.206 :1.030 :1.082 :1.051 a 141.39 0.000 T11Bathroom :1.141 :1.094 :1.041 :1.074 a 269.61 0.000 T12Park :1.072 :1.161 :1.086 :1.023 a 92.64 0.000 T13Clean :1.126 :1.068 :1.038 :1.012 a 262.14 0.000 T14Bed :1.056 :1.079 :1.002 :1.032 a 63.72 0.000 T15Location :1.065 :1.053 :1.003 :1.001 a 114.25 0.000 T16

G. Li et al. / Tourism Management 46 (2015) 311e321 319

(e.g., airports, taxis, and trains) are also perceived as very importantto travelers when evaluating hotels. From such information, hotelmanagers can thus develop more effective marketing campaigns bypresenting these features in their advertisements (e.g., showingpictures of beautiful outdoor settings or mentioning the conve-nience of local transportation).

The identification of emerging hotel features also highlightschanges in the topics of concern to travelers over the past five years.Hotel facilities, such as clubs, lounges and pools, now attractsignificantly more attention. Therefore, managers can develop in-vestment plans focused on these emerging positive features, whiledirecting efforts and resources away from those that receive lessattention. The decision-making abilities of managers can also befurther enhanced by trend analysis. For instance, long-term plans

Fig. 4. User segmentation by travelers' origin.

can be made for adding club facilities or improving food qualitybased on the apparent increasing trend in traveler's interest inthese features.

Section 4 also focuses on some features of specific interest,which can help managers develop effective and targetedimprovement plans. For instance, bed designs can be modified to fitthe expectations of Asian or Oceanian travelers, because they aremost concerned about this issue. Outdoor features, such as rivers orlandscape views, should be given special consideration when ar-ranging accommodations for couples. Internet facilities can also beimproved to meet the expectations of their main traveler groups,specifically business travelers and couples.

The support threshold used in our case study generates a rela-tively few hotel feature candidates. Nevertheless, the list

Fig. 5. User segmentation by travel mode.

G. Li et al. / Tourism Management 46 (2015) 311e321320

encompasses most of the features of current concern to travelers.Depending on users' needs and aspirations, a lower supportthreshold can be set to generate a list, which includes more hotelfeatures. The present case study demonstrates the EPM use in thecontext of the tourism industry. This method is applied on a large-scale data set to identify the general changes and trends in trav-elers' hotel preferences in the Asia Pacific region. The user profileanalysis of previously reported hotel features is conducted only ontwo popular attributes, namely, traveler origin and travel mode. Inthe future, more attributes and detailed groupings can be used toconstruct profiles in real-life applications.

6. Conclusions

Hotel managers who are looking at product design and devel-opment should understand travelers' concerns so that they canenhance business performance. Managers are interested inemerging issues or trends tomake appropriate adjustments to theirplans, which can save internal resources and maximize returns oninvestment. Hotel managers also need to identify features of in-terest to specific groups to enable them to develop more efficienthotel improvement plans and meet their guests' expectations.Despite considerable efforts made by researchers, generating in-sights to help tourism managers address travelers' concerns andcreate a competitive hotel industry remain challenging tasks. Cur-rent research has been unable to demonstrate an effective methodfor addressing such demands comprehensively.

To fill this research gap, we applied the EPM concept to discoverchanges and trends in travelers' attention. A set of features ofspecific interest and their target users have been identified. Theanalysis reported in this paper is based on a large-scale data set ofonline reviews, which is a promising data source because theconcerns expressed by travelers in these websites closely reflectthose in real-life.

An extension of this work could identify which hotel featuresare usually of interest to a specific type of traveler. This way,managers can design their travel packages to target differentgroups. More attributes can also be considered to construct moredetailed user profiles. The method introduced in this paper is ageneral technique that can identify specific features from onlinereviews. This means that the proposed method can be used topinpoint issues of concern to travelers in other tourism contexts,such as airlines, restaurants, or other attractions. Finally, surveys infuture research with a sufficient number of respondents can, andprobably should, investigate hotel managers' views in terms of in-dustrial applications. Industry practitioners with various personaland business backgrounds will likely have different views on thisapproach.

Acknowledgment

This project was partly supported by a research grant funded bythe Hong Kong Polytechnic University, Hong Kong Scholars Pro-gram, and a research grant funded by the National Natural ScienceFoundation of China (71361007).

References

Albaladejo-Pina, I. P., & Diaz-Delfa, M. T. (2009). Tourist preferences for rural housestays: evidence from discrete choice modeling in Spain. Tourism Management,30(6), 805e811.

Ananth, M., DeMicco, F. J., Moreo, P. J., & Howey, R. M. (1992). Marketplace lodgingneeds of mature travellers. The Cornell Hotel and Restaurant AdministrationQuarterly, 33(4), 12e24.

Ariffin, A. A. M., & Maghzi, A. (2012). A preliminary study on customer expectationsof hotel hospitality: influences of personal and hotel factors. InternationalJournal of Hospitality Management, 31(1), 191e198.

Banyai, M. (2012). Travel blogs: a reflection of positioning strategies? Journal ofHospitality Marketing and Management, 21(4), 421e439.

Bjorkelund,, E., Burnett, T. H., & Norvag, K. (2012). A study of opinion mining andvisualization of hotel reviews. In Proceedings of the 14th International Conferenceon Information Integration and Web-based Applications & Services, Bali, Indonesia(pp. 229e238).

Bosangit, C., Dulnuan, J., & Mena, M. (2012). Using travel blogs in examining post-consumption behavior of tourists. Journal of Vacation Marketing, 18(3), 207e219.

Bulchand-Gidumal, J., Melian-Gonzalez, S., & Lopez-Valcarcel, B. G. (2011).Improving hotel ratings by offering free wi-fi. Journal of Hospitality and TourismTechnology, 2(3), 235e246.

Capriello, A., Mason, P. R., Davis, B., & Crotts, J. C. (2013). Farm tourism experiencesin travel reviews: a cross-comparison of three alternative methods for dataanalysis. Journal of Business Research, 66(6), 778e785.

Carson, D. (2008). The ‘blogosphere’ as a market research tool for tourism desti-nations: a case study of Australia's northern territory. Journal of Vacation Mar-keting, 14(2), 111e119.

Chaves, M. S., Gomes, R., & Pedron, C. (2012). Analysing reviews in the web 2.0:small and medium hotels in Portugal. Tourism Management, 33(5), 1286e1287.

Choi, T. Y., & Chu, R. (2001). Determinants of hotel guests’ satisfaction and repeatpatronage in the Hong Kong hotel industry. International Journal of HospitalityManagement, 20(3), 277e297.

Davidson, L., & Skinner, H. (2010). I spy with my little eye: a comparison of manualversus computer-aided analysis of data gathered by projective techniques.Qualitative Market Research: An International Journal, 13(4), 441e459.

Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trendsand differences. In KDD '99 Proceedings of the Fifth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining. San Diego, CA, USA (pp.43e52).

Fan, H., Fan, M., Ramamohanarao, K., & Liu, M. (2006). Further improving emergingpattern based classifier via bagging. In Proceeding of the 10th Pacific-Asia con-ference on Advances in Knowledge Discovery and Data Mining (PAKDD), Singapore(pp. 91e96).

Fan, H., & Ramamohanarao, K. (2003). A Bayesian approach to use emerging pat-terns for classification. In Proceeding of the 14th Australasian database conference(ADC-03), Adelaide, Australia (pp. 39e48).

Gan, M., & Dai, H. (2009). Efficient mining of top-k breaker emerging sub-graphpatterns from graph data set. In Proceeding of the 8th Australasian Data Min-ing Conference (AusDM), Melbourne, Australia (pp. 183e191).

Hsieh, L.-F., Lin, L.-H., & Lin, Y.-Y. (2008). A service quality management architecturefor hot spring hotels in Taiwan. Tourism Management, 29(3), 429e438.

Huang, Z., Gan, C., Lu, X., & Huan, H. (2013). Mining the changes of medical be-haviors for clinical pathways. Studies in Health Technology and Informatics,192(1e2), 117e121.

Huettel, S. (2010). Technology, consumer preferences changing hotel business, WestinTampa Bay manager says. Florida, United States: Tampa Bay Times. URL http://www.tampabay.com/news/business/tourism/technology-consumer-preferences-changing-hotel-business-westin-tampa-bay/1125506. retrieved, 13 July, 2013.

Khadaroo, J., & Seetanah, B. (2008). The role of transport infrastructure in inter-national tourism development: a gravity model approach. Tourism Management,29(5), 831e840.

Kim, J. K., Song, H. S., & Kim, H. K. (2005). Detecting the change of customerbehavior based on decision tree analysis. Expert Systems, 22(4), 193e205.

Law, R., Rong, J., Vu, H. Q., Li, G., & Lee, H. A. (2011). Identifying changes and trendsin Hong Kong outbound tourism. Tourism Management, 32(5), 1106e1114.

Lento, X., Park, J., Park, O., & Lehto, M. R. (2007). Text analysis of consumer reviews:the case of virtual travel firms. In M. J. Smith, & G. Salvendy (Eds.), Humaninterface and the management of information. Methods, techniques and tools ininformation design (pp. 490e499). Berlin Heidelberg: Springer.

Leung, D., Lee, H. A., & Law, R. (2011). The impact of culture on hotel ratings: analysis ofstar-rated hotels in China. Journal of China Tourism Research, 7(3), 243e262.

Li, J., Dong, G., & Ramamohanarao, K. (2000). Instance-based classification by emergingpatterns. In Proceedings of the 14th European Conference on Principles and Practice ofKnowledge Discovery in Database (PKDD-2000), Lyon, France (pp. 191e200).

Li, J., Dong, G., & Ramamohanarao, K. (2001). Making use of the most expressivejumping emerging patterns for classification. Knowledge and Information Sys-tems, 3(2), 1e29.

Li, G., Law, R., Rong, J., & Vu, H. Q. (2010). Incorporating both positive and negativeassociation rules into the analysis of outbound tourism in Hong Kong. Journal ofTravel & Tourism Marketing, 27(8), 812e828.

Li, G., Law, R., Vu, H. Q., & Rong, J. (2013). Discovering the hotel selection preferencesof Hong Kong inbound travelers using the Choquet integral. Tourism Manage-ment, 36, 321e330.

Li, J., Liu, H., Downing, J. R., Yeoh, A. E.-J., & Wong, L. (2003). Simple rules underlyinggene expression profiles of more than six subtypes of acute lymphoblasticleukemia (all) patients. Bioinformatics, 19(1), 71e78.

Li, J., & Wong, L. (2002). Identifying good diagnostic gene groups from geneexpression profiles using the concept of emerging patterns. Bioinformatics,18(5), 725e734.

Li, J., & Yang, Q. (2007). Strong compound-risk factors: efficient discovery throughemerging patterns and contrast sets. IEEE Transactions on Information Technol-ogy in Biomedicine, 11(5), 544e552.

Liu, S., Law, R., Rong, J., Li, G., & Hall, J. (2013). Analyzing changes in hotel customers'expectations by trip mode. International Journal of Hospitality Management, 34,359e371.

G. Li et al. / Tourism Management 46 (2015) 311e321 321

Liu, Q., Shi, P., & Hu, Z. (2013). Fast algorithms for mining strong jumping emergingpatterns using the contrast pattern tree. ICIC Express Letters, Part B: Applications,4(1), 121e128.

Liu, Q., Shi, P., Hu, Z., & Zhang, Y. (2014). A novel approach of mining strong jumpingemerging pattens based on BSC-tree. International Journal of System Science,45(3), 598e615.

Lockyer, T. (2005). Understanding the dynamics of the hotel accommodation pur-chase decision. International Journal of Contemporary Hospitality Management,17(6), 481e492.

Mack, R., Blose, J. E., & Pan, B. (2008). Believe it or not: credibility of blogs intourism. Journal of Vacation Marketing, 14(2), 133e144.

Merlo, E. M., & de Souza Joao, I. (2011). Consumers attribute analysis of economichotels: an exploratory study. African Journal of Business Management, 5(21),8410e8416.

Munoz Gil, R., Aparicio, F., De Buenaga, M., Gachet, D., Puertas, E., Giraldez, I., et al.(2011). Tourist face: a content system based on concepts of freebase for accessto the cultural-tourist information. In Proceedings of the 16th InternationalConference on Applications of Natural Language to Information System (NLDB2011), Alicante, Spain (pp. 300e304).

Pan, B., MacLaurin, T., & Crotts, J. C. (2007). Travel blogs and the implications fordestination marketing. Journal of Travel Research, 46(1), 35e45.

Park, J. H., Lee, H. G., & Park, J. H. (2010). Real-time diagnosis system using incre-mental emerging pattern mining. In Proceedings of the 5th International Con-ference on Ubiquitous Information Technologies and Applications (CUTE2010),Sanya, China (pp. 1e5).

Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling. InProceedings of the Workshop on Comparing Corpora. Hong Kong, China (pp. 1e6).

Reisinger, Y., & Turner, L. (1997). Cross-cultural differences in tourism: Indonesiantourists in Australia. Tourism Management, 18(3), 139e147.

Ruiz-Martinez, J. M., Minarro-Gimenez, J. A., Castellanos-Nieves, D., Garcia-Saanchez, F., & Valencia-Garcia, R. (2011). Ontology population: an applicationfor the E-tourism domain. International Journal of Innovative Computing, Infor-mation and Control, 7(11), 6115e6183.

Sherhod, R., Gillet, V. J., Judson, P. N., & Vessey, J. D. (2012). Automatic knowledgediscovery for toxicity prediction using jumping emerging pattern mining.Journal of Chemical Information and Modeling, 52(11), 3074e3087.

Shie, B. E., Yu, P. S., & Tseng, V. S. (2013). Mining interesting user behavior patternsin mobile commerce environments. Applied Intelligence, 38(3), 418e435.

Singh, N., Hu, C., & Roehi, W. (2007). Text mining a decade of progress in hospitalityhuman resource management research: identifying emerging thematic devel-opment. International Journal of Hospitality Management, 26(1), 131e147.

Sohrabi, B., Vanami, I. R., Tahmasebipur, K., & Fazil, S. (2012). An exploratory analysisof hotel selection factors: a comprehensive survey of Tehran hotels. Interna-tional Journal of Hospitality Management, 31(1), 96e106.

Stringam, B. B., & Gerdes, J. J. (2010). An analysis of word-of-mouse ratings andguest comments of online hotel distribution sites. Journal of Hospitality Mar-keting and Management, 19(7), 773e796.

Stringam, B. B., Gerdes, J. J., & Vanleeuwen, D. M. (2010). Assessing the importanceand relationships of ratings on user-generated traveler reviews. Journal ofQuality Assurance in Hospitality and Tourism, 11(2), 73e92.

Tsai, C. Y., & Shieh, Y. C. (2009). A change detection method for sequential patterns.Decision Support Systems, 46(2), 501e511.

Tsai, H., Yeung, S., & Yim, P. H. L. (2011). Hotel selection criteria used by mainlandChinese and foreign individual travelers to Hong Kong. International Journal ofHospitality & Tourism Administration, 12(3), 252e267.

Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis.Information Processing and Management, 43(5), 1216e1247.

Tsou, M.-C. (2010). Geographic information retrieval and text mining on Chinesetourism web pages. International Journal of Information Technology and WebEngineering, 5(1), 56e75.

Tussyadiah, I. P., & Fesenmaier, D. R. (2008). Marketing places through first personstories e an analysis of Pennsylvania road tripper blog. Journal of Travel andTourism Marketing, 25(3/4), 299e311.

Varga, B., & Groza, A. (2011). Integrating DBpedia and SentiWordNet for a tourismrecommender system. In Proceedings of the 7th International Conference onIntelligent Computer Communication and Processing (ICCP 2011), Cluj-Napoca,Romania (pp. 133e136).

Wang, G., Zhao, Y., Zhao, X., Wang, B., & Qiao, B. (2010). Efficient mining localconserved cluster from gene expression data. Neurocomputing, 73(7),1425e1437.

Wenger, A. (2008). Analysis of travel bloggers' characteristics and their communi-cation about Austria as a tourism destination. Journal of Vacation Marketing,14(2), 169e176.

Wilkins, H. (2010). Using importance-performance analysis to appreciate satisfac-tion in hotels. Journal of Hospitality Marketing and Management, 19(8), 866e888.

Witte, R., & Baker, C. J. O. (2005). Combining biological databases and text mining tosupport new bioinformatics applications. In Natural Language Processing andInformation Systems: 10th International Conference on Applications of NaturalLanguage to Information Systems. Alicante, Spain (pp. 310e321).

Ye, Q., Law, R., Li, S., & Li, Y. (2011). Feature extraction of travel destinations fromonline Chinese-language customer reviews. International Journal Services Tech-nology and Management, 15(1/2), 106e118.

Yu, H.-H., Chen, C.-H., & Tseng, V. S. (2011). Mining emerging patterns from timeseries data with time gap constraint. International Journal of InnovativeComputing Information and Control, 7(9), 5515e5528.

Zhou, X., & Han, H. (2006). Approaches to text mining for clinical medicalrecords. In Annual ACM Symposium on Applied Computing 2006, TechnicalTracks on Computer Applications in Health Care. Dijon, France (pp.235e239).

Gang Li, Ph.D., IEEE Senior member, is a Senior Lecturer atthe School of Information Technology, Deakin University.His research interests are machine learning, data mining,and technology applications to tourism and hospitality. Hehas coauthored four best paper awarded articles. Heserved as PC member for 80 þ international conferences,and is a regular reviewer for international journals inrelevant research areas.

Rob Law, Ph.D. is a Professor at the School of Hotel andTourism Management, the Hong Kong Polytechnic Uni-versity. His research interests are information manage-ment and technology applications.

Huy Quan Vu is currently a PhD student at Deakin Uni-versity. His research interests include machine learning,data mining, and their applications in tourism.

Jia Rong, PhD. is a research associate at the School of In-formation Technology, Deakin University. Her researchinterests are data mining, multimedia data analysis, andtechnology applications to tourism and hospitality. Shewas awarded The Professor of Information TechnologyAward (2010) for the most academically outstanding PhDstudent, School of IT, Deakin University, Australia.

Xinyuan (Roy) Zhao, Ph.D., is an Associate Professor inHospitality Management at Business School, Sun Yat-SenUniversity (SYSBS). His research has been published widelyon top-tier tourism and hospitality journals, and has beenfunded by National Natural Science Foundation of China,Chinese Department of Education, Guangdong Social Sci-ence Foundation, and Guangzhou Social ScienceFoundation.