1 CRATS: An LDA-based Model for Jointly Mining Latent Communities, Regions, Activities ...chiychow/papers/TKDE_2016b.pdf · 2016-12-05 · 1 CRATS: An LDA-based Model for Jointly

1

CRATS: An LDA-based Model for Jointly MiningLatent Communities, Regions, Activities, Topics,and Sentiments from Geosocial Network Data

Jia-Dong Zhang, and Chi-Yin Chow, Member, IEEE

Abstract—Geosocial networks like Yelp and Foursquare have been rapidly growing and accumulating plenty of data such as sociallinks between users, user check-ins to venues, venue geographical locations, venue categories, and user textual comments on venues.These data contain rich knowledge on the user’s social interactions in communities, geographical mobility patterns between regions,categorical preferences on activities, aspect interests in topics, and opinion expressions for sentiments. Such knowledge is essentialfor two key applications, namely, text sentiment classification and venue recommendations, which will be developed in this paper. Toextract the knowledge from the data, the key task is to discover the latent communities, regions, activities, topics, and sentiments ofusers. However, these latent variables are interdependent, e.g., users in the same community usually travel on nearby regions andshare common activities and topics, which renders a big challenge for modeling these latent variables. To tackle this challenge, in thisstudy, we propose an LDA-based model called CRATS that jointly mines the latent Communities, Regions, Activities, Topics, andSentiments based on the important dependencies among these latent variables. To the best of our knowledge, this is the first study tojointly model these five latent variables. Finally, we conduct a comprehensive performance evaluation for CRATS in differentapplications, including text sentiment classification and venue recommendations, using three large-scale real-world geosocial networkdata sets collected from Yelp and Foursquare. Experimental results show that CRATS achieves significantly superior performanceagainst other state-of-the-art techniques.

Index Terms—Geosocial network, topic modeling, latent Dirichlet allocation, generative probabilistic model, collapsed Gibbs sampling,text sentiment classification, venue recommendation.

F

1 INTRODUCTION

With the rapid pervasiveness of mobile devices embedded withwireless communication and location acquisition abilities, geoso-cial networks such as Yelp, Foursquare, and Facebook Places,have become some of the most popular Internet applications andattracted millions of users. The geosocial networks bridge thephysical world with the virtual online world. For example, in ageosocial network, users can establish social links with each otherto share their experiences of visiting some interesting venues,e.g., restaurants, stores, and museums, through performing check-ins to venues in the geosocial network and writing comments toexpress opinions about venues’ aspects, e.g., atmosphere, quality,and price. These rich data including social links between users,user check-ins on venues, venue geographical locations, venuecategories, and user textual comments on venues are the reflectionof human behaviors in reality and bring new opportunities tomodel the process of how users make decisions of visiting venues.

In this paper, we aim to extract knowledge from the dataaccumulated in geosocial networks to answer five interrelatedquestions: (1) How do users interact with each other or whatcommunities do they participate in? (2) How do users movefrom one venue to another venue or what regions do they visit?(3) What categories (e.g., restaurants, stores, and museums) ofvenues do users prefer or what activities do they perform at

• J.-D. Zhang and C.-Y. Chow are with Department of Computer Science,City University of Hong Kong, Hong Kong.E-mail: [email protected], [email protected].

venues? (4) What aspects (e.g., atmosphere, quality, and price)of venues are users interested in or what topics do they careabout when performing activities in venues? (5) What opinions(e.g., good, bad and high) do they express on the aspects ofvenues or what are their sentiments (e.g., positive, negative, orneutral) on the topics? As demonstrated in this paper, answeringthese five questions enables two key applications, namely, textsentiment classification for comments and venue recommendationsfor users [1], [2], [3], [4].

The text sentiment classification focuses on mining opinionsthat express positive, negative, or neutral sentiments towards enti-ties such as products, services, organizations, individuals, issues,events, and topics. The opinions are central to almost all humanactivities, because they are key influencers of consumer behaviorsand whenever we need to make a decision based on people’sopinions. For example, businessmen always want to find publicopinions about their products or services and consumers alsowant to know the opinions of existing users of a product beforepurchasing it. With the help of personalized venue recommen-dations, people are much easier and more convenient to learnabout nearby events and/or places of interest that are relevantto their preferences. In other words, venue recommendations arebeneficial for people to explore new places in their city (especiallywhen they are travelling in a new city) and enrich their dailylife. Moreover, venue recommendations help companies discoverpotential customers for businesses and improve the profit of theirbusinesses.

To address the aforementioned five questions, one may exploitexisting Latent Dirichlet Allocation (LDA [5]) based topic models

2

to separately discover the latent variables which aim to capturecommunities, regions, activities, topics, or sentiments of users(e.g., social topic models for communities and topics [6], [7],geographical topic models for regions and topics [8], [9], orsentimental topic models for topics and sentiments [10], [11]).However, these latent variables are interdependent, for example,users in the same community usually go to the same regionstogether, participate in common activities, or discuss similar topicswith each other. Therefore, it is required to jointly mine theselatent variables from the observed data in geosocial networks.Unfortunately, none of existing techniques can be applied forthis kind of joint mining. To this end, we propose an LDA-based model for jointly mining the latent variables to captureCommunities, Regions, Activities, Topics, and Sentiments, calledCRATS, that incorporates the important dependencies amongthese latent variables in the latent level, human behaviors in thereality level, and geosocial network data in the observed level, asdepicted in Fig. 1:

• Community dependency. The users are more closely linked toeach other and share more common interests on venues withinthe same community than different communities. Thus, CRATSdiscovers communities based on both the social links betweenusers and users’ check-ins to venues.

• Region dependency. A region should be coherent in the ge-ographical space. Thus, instead of representing a region as amultinomial distribution over discrete venues that may be farfrom each other, CRATS models a region as a geographicalGaussian distribution with a center by utilizing the continuousgeographical locations (i.e., latitudes and longitudes) of venues.

• Activity dependency. The categories of a venue visited bya user implicitly indicate what activities have been done bythe user at the venue. For instance, a person checking in arestaurant means that she may have a meal there. Accordingly,CRATS derives latent activities based on venue categories. Inthis paper, we refer “activity” to a latent variable and “category”to observable information. The reason is that although the staticcategory information of venues is already known, the dynamicactivities performed by users at the venues are unknown. Notethat people can perform different activities at a venue, becausethe venue may belong to multiple categories. For example,people may attend a conference, have a meal, or stay at a hotel.

• Topic dependency. Topics are highly dependent on the under-lying activities performed by users. For example, when usershave a meal at a restaurant, they usually talk about aspects of therestaurant, e.g., the atmosphere and taste of food; when userswatch a football game at a stadium, they are likely to discussthe performance of athletes. In CRATS, users are clusteredinto communities, so topics are always specific to activitiesand communities. A topic corresponds to a distribution overobservable words, i.e., the aspects in this paper. Moreover,“activity” focuses on what people are doing while “topic”concentrates on what people are talking about.

• Sentiment dependency. In reality, users often have differentopinions towards a variety of aspects of venues, e.g., a usermay like the atmosphere of a restaurant but dislike the priceof food offered by the restaurant. That is, in the latent level,sentiments strongly rely on topics. Therefore, in CRATS, senti-ments are always conditioned on a certain topic, and a sentimentcorresponds to a distribution over observable opinion words.

The main contributions of this paper are listed below:

Community

Geographical

Mobility

Opinion

Expression

Aspect

Interest

Categorical

Preference

Social

Interaction

Region Activity Topic Sentiment

Geographical

Locations of

Venues

Textual Comments

of Users on Venues

Categories of

Venues

Social Links

and

Check-ins

Reality

Level:

Latent

Level:

Observed

Level in

Geosocial

Networks:

Fig. 1. Dependencies of the reality, latent, and observed levels in CRATS

• We propose an LDA-based generative probabilistic modelCRATS for jointly mining the latent variables to capture com-munities, regions, activities, topics, and sentiments of usersfrom the geosocial network data in a unified manner. Section 2discusses the significant differences of CRATS compared tocurrent methods. (Section 3.2)

• We contrive an approximate learning method based on col-lapsed Gibbs sampling to estimate the model parameters ofCRATS (Section 3.3), with the linear complexity with respectto the training data size (Section 3.4).

• We demonstrate how CRATS can be applied in two impor-tant applications, namely, text (e.g., comment, review, post,and tip) sentiment classification, and venue recommendations.(Section 3.5)

• Extensive experiments are conducted to evaluate the perfor-mance of CRATS in our two applications using three large-scalereal-world data sets collected from Yelp and Foursquare. Ex-perimental results show that CRATS significantly outperformsother state-of-the-art competitors. (Section 4)The rest of this paper is organized as follows. We highlight

related work in Section 2. Our joint model CRATS is presentedin Section 3, followed by experimental evaluation in Section 4.Finally, Section 5 concludes this paper.

2 RELATED WORK

Topic modeling. In last decades, topic modeling has been widelystudied in text mining, information retrieval, and natural languageprocessing to automatically uncover the hidden semantic structureof a text collection. In general, the studies on topic modelingcan be classified into two main categories: (1) non-probabilisticmethods, e.g., regularized latent semantic indexing [12] and non-negative matrix factorization [13], and (2) probabilistic methods,e.g., probabilistic latent semantic indexing (PLSI) [14], [15] andlatent Dirichlet allocation (LDA) [5], [16], [17]. By defining atopic as a probability distribution over words and a text as amixture of topics, the probabilistic methods can be extended tointegrate texts with additional information, including social links,geographical locations, temporal contexts, and sentiments.

Social topic. There are several studies on generative topicmodels based on texts and social links. For example, the liter-ature [18] presents a Topic-Link LDA to jointly model topicsand communities based on the observation that a link betweentwo documents is not only determined by content similarity butalso affected by the community tying between the authors. Thework [7] proposes a Topic-on-Participation model by placinga topical variable for each author participation in documents,in which the social links (i.e., the co-author relationships of

3

documents) are transformed into an equivalent participation graph.More sophisticatedly, the recent study [6] incorporates communitydiscovery into topic analysis in text-associated graphs to guaranteethe topical coherence in the communities by explicitly separatingthe concepts of communities and topics, that is, one communitycan correspond to multiple topics and multiple communities canshare the same topic.

Geographical topic. With the rapid growth of geosocialnetworks, there are also some studies that integrate geographicallocations into topic modeling. In general, these studies assumethat every region has its own topics captured from the textstagged with geographical locations. Some researchers [19], [20]represent locations by their distinguishing identifiers rather thangeographical coordinate information. For example, Son et al. [19]map a location into a topic space based on explicit localizedsemantic analysis, i.e., they associate the location with the prob-ability distribution over topics which are explicitly defined withWikipedia concepts. Then they recommend news articles for usersbased on the similarity between the distribution of topics in thenews article and the distribution of topics in the current locationof users. In contrast, Yin et al. [20] associate each topic withtwo topic models, i.e., the probability distribution over words andthe probability distribution over locations. This design enablesthe two topic models to be mutually influenced and enhancedduring the topic discovery process, which facilitates the clusteringof content-similar spatial locations into the same topic with highprobability. These topic modeling manners can distinguish thefunctions between locations independently of the geographicalcoordinates. However, they cannot take full advantage of thecoordinate information of locations, which is important to theanalysis of user mobility over regions.

Other researches also exploit the coordinate information oflocations for geographical topic modeling and get better perfor-mance. The work [21] represents a user as the distribution overtopics and a topic as the distribution over locations, and then com-bines the distributions with the influence of geographical proximityin terms of the coordinate information of locations to derive thepreferences of users on new locations for venue recommendations.The studies [22], [23] also utilize the geographical influence tolearn the preferences of users on locations by introducing a set oflatent regions which are Gaussian distributions over the latitudeand longitude coordinates of locations. Besides the distributionover the coordinates of locations, other studies [8], [9], [24],[25] further attach a latent region with the distributions overtopics to discover different topics of interests that are coherentin geographical regions. Specifically, the work [8] employs thetwo distributions to deduce a location’s distribution over topics tocompare the topics in different locations. The research [24]dis-covers geographical topics from the tweet stream and attempts toreflect the preferences of Twitter users and the dependency be-tween regions and topics based on topical diversity, geographicaldiversity, and the interest distribution of users. The literatures [9],[25] use both the identifiers and coordinates of locations, in whichthe location of the tweet is drawn from a weighted combinationof the distribution of a topic over location identifiers and thedistribution of a region over location coordinates.

Temporal topic. Time is also a very important factor thatinfluences human activities. Some works focus on temporal topicmodels and can be generally classified into two groups. (1) Tem-poral dynamics of topics. The group estimates the distributionof topics from a sequentially organized corpus of documents in

various eras to captures the evolution of topics as time goes on. Forexample, the study [26] models the documents of each time slicewith a K-component topic model, where the topics associatedwith slice t evolve from the topics associated with slice t − 1.The work [15] mines common topics from multiple asynchronoustext sequences by adjusting the time stamps of the documentsaccording to the topic distribution of time. (2) Temporal period-icity of topics. This group aims to capture the daily or weeklyperiodic pattern of human behaviors through connecting a topicwith a distribution over time of a day or a week, respectively.For instance, the research [27] discovers the topics with a narrowtime distribution, in which a strong word co-occurrence patternappears for a brief moment in time and then disappears. Theliteratures [9], [25] considers time in a day as a continuous variableand categorizes days into two classes, namely, weekdays andweekends for requirement-aware venue recommendations, wheretime only affects the region, but not the topic. Due to spacelimitations, this paper does not consider the time factor. Instead,we will extend our model to integrate temporal dynamics and/orperiodicity in the future work.

Sentimental topic. There are rich literatures that aim touncover topics and sentiments by separating opinion words fromaspect words in texts for sentiment classification [4]. Most studiesextend the PLSI or LDA model by placing a latent variable forboth topics and sentiments. (1) Some methods [11], [28], [29]assume the latent sentiments are conditioned on topics, i.e., eachtopic has a multinomial distribution over sentiments. For instance,the work [29] leverages the aspect and overall ratings of itemsto discover the relative emphasis of reviewers on the differentaspects of the items using regression analysis. The paper [28]extracts aspects and corresponding ratings of items from bothtextual reviews and overall ratings on the items for aspect-basedopinion summarization. In the study [11], the model is trained atthe category level instead of the item level to address the itemcold-start problem by learning the latent factors using the reviewsof all the items of a category. (2) In contrast, other methods [10],[30], [31] assume the latent topics are conditioned on sentiments,i.e., each sentiment has a multinomial distribution over topics.Specifically, the work [30] automatically discovers what aspectsare evaluated in reviews and how opinions are expressed fordifferent aspects based on the sentence-LDA model that assumesall words in a sentence are generated from a single aspect. Thestudies [10], [31] detect coherent and informative topics andsentiments simultaneously from text for document-level sentimentclassification by utilizing a domain independent sentiment lexicon.

Note that some methods are supervised or semi-supervised.For example, the works [9], [32], [33], [34] employ the ad-hocHashtags of Twitter to extract topics from tweets, while otherworks [28], [29], [35], [36], [37], [38], [39] use the overall oraspect ratings associated with texts to derive the sentiments; asa result, the applications of these works are severely constrainedsince the Hashtags or ratings are not usually included in the textsof most social media websites.

Differences of CRATS from current methods. We candistinguish our proposed CRATS model from current works interms of the following four facets: (1) In CRATS, the latentcommunities are deduced from both social links between usersand users’ check-ins to venues to ensure the users in samecommunity share common interest on venues, which is differentfrom the existing works concentrating on social links only. (2)To the best of our knowledge, CRATS is the first model that

4

integrates latent activities into topic modeling and sentimentanalysis. (3) Moreover, CRATS is the first model that jointly mineslatent variables capturing communities, regions, activities, topicsand sentiments from geosocial networks, which is distinct from theexisting works that can discover only two latent variables amongsocial topics, geographical topics, and sentimental topics. (4) OurCRATS is an unsupervised model and can be applied in socialmedia websites for various important applications, e.g., aspect-based opinion summarization, text sentiment classification, andvenue recommendations.

3 THE JOINT LATENT MODEL

We present the key data structures observed in geosocial networksin Section 3.1, the probabilistic generative process of CRATS inSection 3.2, the parameter learning and complexity analysis forCRATS in Sections 3.3 and 3.4, and the applications of CRATSin Section 3.5. For the sake of presentation, TABLE 1 lists thekey notations used in this paper. We unify the notations asfollows: (1) A uppercase letter denotes a set and its correspondinglowercase letter denotes its element, e.g., U is a set of users andu is a user in U . (2) A bold lowercase letter denotes a vector, theelement of which is indexed by the subscript i, j, or k, e.g., u is theuser vector in the check-in data and ui is the user in the i-th check-in. (3) A Greek letter denotes the model parameter of CRATS, i.e.,the parameter of a multinomial or Gaussian distribution.

3.1 PreliminaryAt first, we define the key data structures observed in geosocialnetworks and concepts used in this paper.

Definition 1 (Opinion phrase). An opinion phrase ⟨h,m⟩ is apair of heading word h ∈ H and modifying word m ∈ M ,where the heading word h indicates an aspect (e.g., “atmosphere”,“price”, and “quality”) and the modifying word expresses anopinion (e.g., “good”, “bad”, and “high”) towards an aspect.

Definition 2 (Comment). A comment is a bag of words W thatconsists of a set of opinion phrases W = ⟨hj ,mj⟩.

As an example, consider a typical comment written by auser regarding a restaurant: “I went there last night. I lovedthe place with good atmosphere. The taste is not bad. Thefood has high quality but with a little high price.” In thiscomment, the user has expressed different opinions to a va-riety of aspects of the restaurant, i.e., ⟨atmosphere, good⟩,⟨taste, not bad⟩, ⟨quality, high⟩, ⟨price, high⟩. Note that theopinion phrases can be extracted from texts using natural languageprocessing tools, e.g., the Stanford natural language parsers [40].

Definition 3 (Check-in). A check-in is a triple (u, v,Wu,v) thatdescribes user u ∈ U visiting venue v ∈ V with comment Wu,v .

Definition 4 (Geographical locations of venues). Each venuev ∈ V is located at a unique geographical location lv with a pairof latitude and longitude coordinates.

Definition 5 (Categories of venues). Each venue v ∈ V belongsto a set of categories Bv ⊂ B, where B is the universal set ofcategories of all venues.

Definition 6 (Social link). A social link associates a pair offriends (f, f ′), in which f ∈ F is a friend of f ′ ∈ F , viceversa. It is important to note that a friend is also a user in U , i.e.,

TABLE 1Key Notations in the Paper

Sym. MeaningData and assignments

u,v, f User, venue and friend vectors in a data seth,m Heading and modifying word vectors in a data setc, r,a Community, region and activity assignment vectorst, s Topic and sentiment assignment vectorsi, j, k Indexes of check-in in a data set, opinion phrase in a comment, and

social link in the data set, respectivelyObserved variables

H Set of heading word h: H = hM Set of modifying word m: M = mW Set of opinion phrases in a comment: W = ⟨hj ,mj⟩U Set of user u: U = uV Set of venue v: V = vlv A pair of latitude and longitude of venue vBv Set of categories of venue vF Set of friend f : F = f (note that F = U )

Latent variablesC Set of latent community c: C = cR Set of latent region r: R = rA Set of latent activity a: A = aT Set of latent topic t: T = tS Set of latent sentiment s: S = s

Model parametersβu,c Distribution of communities c for user uωc,f Distribution of friends f for community cψc,v Distribution of venues v for community cγu,r Distribution of regions r for user uµr Geographical mean location of region rΣr Geographical covariance matrix of region rαu,a Distribution of activities a for user uδa,v Distribution of venues v for activity aτc,a,t Distribution of topics t for c’s activity aρa,t,h Distribution of heading words h for a’s topic tηv,t,s Distribution of sentiments s for v’s topic tθt,s,m Distribution of modifying words m for t’s sentiment s

f ∈ U ; we use f to indicate a user who comes from a social linkrather than a check-in.

The data set used in our model includes (1) a check-in set(ui, vi,Wi = ⟨hij ,mij⟩) in which i and j are the indexesof check-ins and opinion phrases, respectively, (2) a social linkset (fk, f ′k) in which k is the index of social links, and(3) the geographical locations lv and categories Bv associatedwith venues v ∈ V . Note that: Based on indexes i, j, and k, theusers, venues, heading and modifying words, and friends in thedata set are also represented as a user vector u, venue vector v,heading word vector h, modifying word vector m, and a pair offriend vectors f and f ′. Two users fk in f and f ′k in f ′ are friends.

3.2 Probabilistic Generative Process

Like the classical LDA [5], our proposed CRATS is a generativeprobabilistic model, i.e., generating observed data from latentvariables given model parameters, as depicted in Fig. 2. CRATSaims to mimic the decision making process of users checkingin venues through considering the user’s social interaction incommunities, geographical mobility between regions, categoricalpreference on activities, aspect interest in topics, and opinionexpression for sentiments. The generative process of CRATS issummarized in Algorithm 1. The essential idea is that CRATSjointly models latent communities, regions, activities, topics andsentiments based on the following dependencies.

5

Fig. 2. Standard graphical representation for the LDA-based model, i.e.,CRATS (Edges indicate dependencies and a box around variables is aplate denoting | · | replicates.)

The dependency of communities on social interaction. Inreality, people interact with each other and friends often go somevenues like restaurants together. Likewise, in geosocial networks,users establish social links and form communities to share theirexperiences of visiting venues. Thus, the users in the same com-munity should be closely linked to one another and share commoninterests on venues. Accordingly, in CRATS communities arediscovered from both social links between users and user check-ins to venues (Lines 2, 4 and 8 in Algorithm 1). Formally, eachuser u has a multinomial distribution βu with Dirichlet priorβ0 = 1/|C| over latent communities c ∈ C , i.e., βu,c representsthe probability of user u participating in community c. Note that itis very common to set the Dirichlet prior to a particular value in theLDA-based models with collapsed Gibbs sampling [9], [20], [32],which allows a multinomial distribution. Each community c hasa multinomial distribution ωc with Dirichlet prior ω0 = 1/|F |over friends f ∈ F and a multinomial distribution ψc withDirichlet prior ψ0 = 1/|V | over venues v ∈ V , i.e., ωc,f andψc,v represent the probability of friend f and venue v includedin community c, respectively. A friend f is also a user in U andindicates the user from social links rather than check-ins.

The dependency of regions on geographical mobility. Auser’s mobility usually centers at several personal geographicalregions, e.g., users tend to visit venues close to their homes oroffices and also may be interested in exploring the nearby placesof their visited venues. Hence, a region should be coherent in thegeographical space. Accordingly, in CRATS a region is modeledas a geographical Gaussian distribution using the continuousgeographic locations (i.e., latitudes and longitudes), instead of amultinomial distribution over discrete venues that may be far fromeach other. Formally, each user u has a multinomial distributionγu with Dirichlet prior γ0 = 1/|R| over latent regions r ∈ R, i.e.,γu,r represents the probability of user u visiting region r (Line 9in Algorithm 1). Each region r has a geographical Gaussiandistribution N (µr,Σr), where µr and Σr are the geographicalmean vector and covariance matrix, respectively.

The dependency of activities on categories of venues. Thecategories of a venue strongly indicate what activities can beperformed by users at the venue or what products and servicesare provided for users. For example, users may have a meal ina restaurant and a Chinese restaurant offers Chinese food forcustomers. Therefore, in CRATS activities are derived from thecategories of venues. It is important to note that the static category

Algorithm 1 Probabilistic Generative Process of CRATS1: for each social link (fk, f

′k) do

2: Draw community ck ∼Multi(βfk )3: Draw friend f ′k ∼Multi(ωck )4: Draw community c′k ∼Multi(βf ′

k)

5: Draw friend fk ∼Multi(ωc′k)

6: end for7: for each check-in (ui, vi,Wi = ⟨hij ,mij⟩) do8: Draw community ci ∼Multi(βui )9: Draw region ri ∼Multi(γui )

10: Draw activity ai ∼Multi(αui )11: Draw venue vi ∼Multi(ψci )N (lvi |µri ,Σri )Multi(δai )12: for each opinion phrase ⟨hij ,mij⟩ ∈Wi do13: Draw topic tij ∼Multi(τci,ai )14: Draw word hij ∼Multi(ρai,tij )15: Draw sentiment sij ∼Multi(ηvi,tij )16: Draw word mij ∼Multi(θtij ,sij )17: end for18: end for

information of venues is already known, but the dynamic activitiesthat people possibly perform at the venues are unknown, becausea venue may belong to multiple categories and people can performdifferent activities at the venue. For example, people may attend aconference, have a meal, or stay at a hotel. Formally, each user uhas a multinomial distribution αu with Dirichlet prior α0 = 1/|A|over latent activities a ∈ A, i.e., αu,a reflects the preference ofuser u to activity a (Line 10 in Algorithm 1). For instance, afoodie often tastes a variety of food at different restaurants whilea tourism enthusiast usually travels on tourism attractions all overthe world. Moreover, each activity a has a multinomial distributionδa with Dirichlet prior δ0 over venues v ∈ V , i.e., δa,v denotesthe probability of venue v included in activity a. The prior δ0 isderived from the category information of venues. Specifically, weidentify the set A of activities to the universal set B of categoriesin all venues and determine the Dirichlet prior δ0(a, v) for activitya occurring in venue v through

δ0(a, v) =

1/|Bv|, for a ∈ Bv ⊂ B = A,

0, for a /∈ Bv,(1)

where Bv is the set of categories of venue v. Accordingly, thevenue preference of a user is determined based on the user’scurrent community, region, and activity (Line 11 in Algorithm 1).

The dependency of topics on activities and communities.People in a community often concentrate on different topics withdifferent words when they are participating in different activities.For instance, users would like to talk about the atmosphere andtaste of food when they have a meal at a restaurant and discussthe performance of athletes when they watch a football game ata stadium. In other words, topics and words are highly dependenton the underlying activities and communities of users. Thus, inCRATS, topics and words are related to activities and communities(Lines 13 and 14 in Algorithm 1). Note that users have beenclustered into communities. Formally, a community c on eachactivity a has a multinomial distribution τc,a with Dirichlet priorτ0 = 1/|T | over latent topics t ∈ T , i.e., τc,a,t is the probabilityof community c talking about topic t under activity a. Each topic tunder activity a has a multinomial distribution ρa,t with Dirichletprior ρ0 = 1/|H| over heading words h ∈ H that indicate aspectsof venues, i.e., ρa,t,h is the probability of the heading word oraspect h mentioned in topic t conditioned on activity a.

The dependency of sentiments on topics. People usuallyexpress different opinions towards various aspects of venues, e.g.,

6

a person may like the quality of food of a restaurant but dislikeits price level. Thus, sentiments should be conditioned on topicsin the latent level, as in our CRATS (Line 15 in Algorithm 1).Further, the opinion orientation (e.g., positive, negative, or neutral)expressed in a modifying word depends on the aspect implied thecorresponding heading word, e.g., the modifying word “high” ispositive for “quality” but negative for “price”. Accordingly, thesentiment implied in a modifying word counts on the topic in thecorresponding heading word (Line 16 in Algorithm 1). Formally,a venue v on each topic t has a multinomial distribution ηv,twith Dirichlet prior η0 = 1/|S| over latent sentiments s ∈ S,i.e., ηv,t,s is the probability of venue v having sentiment s fortopic t from users. Moreover, each sentiment s has a multinomialdistribution θt,s with Dirichlet prior θ0 = 1/|M | over modifyingwords m ∈ M , i.e., θt,s,m is the probability of the modifyingword m mentioned in topic t for sentiment s.

3.3 Parameter LearningIn Section 3.2, the generative process draws the observed dataD = f , f ′,u,v,h,m by assuming that the model parametersΩ = β, ω, ψ, γ, µ,Σ, α, δ, τ, ρ, η, θ of CRATS are known.Reversely, in practice we need to estimate the model parametersΩ using the observed data D. In general, we aim to find theparameters Ω that maximize the likelihood Pr(D|Ω) of D. How-ever, Pr(D|Ω) cannot be computed tractably due to the couplingamong the latent variables. Therefore, we follow the LDA-basedstudies [9], [20], [32] and devise an approximate learning methodbased on collapsed Gibbs sampling for estimating the modelparameters in CRATS. For each check-in or social link in D,the Gibbs sampling iteratively updates a latent variable given theremaining variables to obtain the latent variable assignments.

In Gibbs sampling, we first initialize the latent variable assign-ments c, r,a, t, s for communities, regions, activities, topics,and sentiments in terms of the Dirichlet prior β0, γ0, δ0, τ0 and η0,respectively. In Algorithm 1 we draw a community ci ∈ c for eachcheck-in and ck, c′k ∈ c for each social link, so the dimension ofthe community assignment vector c is the total number of check-ins plus the double size of social links. Similarly, the dimensionsof r and a are the number of check-ins, and the dimensions oft and s are the total number of opinion phrases in all commentsof D. Then, we apply a five-step Gibbs sampling procedure toiteratively update the assignments c, r,a, t, s. For presentation,we give the detailed derivation process in Appendix A and onlyshow the derived Gibbs sampling formulas as follows.

Step 1: Sampling communities. (1) For the i-th check-in, we sample community ci given the remaining latent andobserved variables c¬i, r,a, t, s, f , f ′,u,v,h,m according tothe posterior probability:

Pr(ci|c¬i,a, t,v, .) ∝(n¬iui,ci + β0

) (n¬ici,vi + ψ0

)∏j

[(n¬ici,ai,tij + τ0

)]∑v∈V

(n¬ici,v + ψ0

)∏j

[∑t∈T

(n¬ici,ai,t + τ0 + j

)] , (2)

where the superscript ¬i denotes a vector (e.g., c¬i) or a count(e.g., n¬i) excluding the current assignment, j = 0, 1, · · ·is the index of opinion phrases in the comment of the i-thcheck-in (the same hereafter), nu,c and nc,v are the number oftimes that community c has been sampled for user u and venuev, respectively. (2) Similarly, for the k-th social link, we sample

the community ck for fk in f and c′k for f ′k in f ′ according to theposterior probability:

Pr(ck|c¬k, f , f ′, .) ∝

(n¬kfk,ck

+ β0)(

n¬kck,f ′

k+ ω0

)∑f∈F

(n¬kck,f

+ ω0

) and (3)

Pr(c′k|c¬k, f , f ′, .) ∝

(n¬kf ′k,c

′k+ β0

)(n¬kc′k,fk

+ ω0

)∑f∈F

(n¬kc′k,f

+ ω0

) . (4)

Note that two friends fk and f ′k in a social link are symmetricaland friends are also users in U because of F = U .

Step 2: Sampling regions. For the i-th check-in, we samplethe region ri according to the posterior probability:

Pr(ri|r¬i,v, .) ∝(n¬iui,ri + γ0

)N (lvi |µri ,Σri), (5)

in which nu,r is the number of times that region r has beensampled for user u, µri is the mean vector of the geographicallocations of all venues that are assigned to region ri, given by

µri =

(∑v∈V

n¬iri,v · lv

)(∑v∈V

n¬iri,v

)−1

, (6)

and Σri is the covariance matrix of the geographical locations ofall venues that are assigned to region ri, given by

Σri =

(∑v∈V

n¬iri,v(lv − µri)(lv − µri)

T

)(∑v∈V

n¬iri,v

)−1

,

(7)where nr,v is the number of times that region r has been sampledfor venue v, lv denotes the geographical location of venue v (i.e.,lv is a column vector including a pair of latitude and longitude),and T denotes the transpose of a vector or matrix.

Step 3: Sampling activities. For the i-th check-in, we samplethe activity ai according to the posterior probability:

Pr(ai|a¬i, c, t,v,h, .) ∝(n¬iui,ai

+ α0

) (n¬iai,vi + δ0

)∑v∈V

(n¬iai,v + δ0

) ×

∏j


)(n¬iai,tij ,hij

+ ρ0)]

∏j

[∑t∈T

(n¬ici,ai,t + τ0 + j

) ∑h∈H

(n¬iai,tij ,h

+ ρ0 + j)] , (8)

where nu,a is the number of times that activity a has been sampledfor user a, na,v is the number of times that venue v has beengenerated from activity a, nc,a,t is the number of times that topict has been sampled for community c under activity a, and na,t,his the number of times that the heading word h has been generatedby topic t under activity a.

Step 4: Sampling topics. For the j-th opinion phrase in thecomment of the i-th check-in, we sample the topic tij accordingto the posterior probability:

Pr(tij |t¬ij , c,a, s,v,h,m, .) ∝

(n¬ijai,tij ,hij

+ ρ0)

∑h∈H

(n¬ijai,tij ,h

+ ρ0)×

(n¬ijci,ai,tij + τ0

)(n¬ijvi,tij ,sij + η0

)(n¬ijtij ,sij ,mij

+ θ0)

∑s∈S

(n¬ijvi,tij ,s + η0

) ∑m∈M

(n¬ijtij ,sij ,m + θ0

) , (9)

7

where nv,t,s is the number of times that sentiment s has beensampled for topic t of venue v, and nt,s,m is the number of timesthat the modifying word m has been generated by topic t withsentiment s.

Step 5: Sampling sentiments. For the j-th opinion phrase inthe comment of the i-th check-in, we sample the sentiment sijaccording to the posterior probability:

Pr(sij |s¬ij , t,v,m, .) ∝

(n¬ijvi,tij ,sij + η0

)(n¬ijtij ,sij ,mij

+ θ0)

∑m∈M

(n¬ijtij ,sij ,m + θ0

) .

(10)Parameter estimation. In collapsed Gibbs sampling, we in-

crementally update the counts n from Steps 1 to 5 for each sociallink and check-in at an iteration, as summarized in Algorithm 2.After a sufficient number of sampling iterations, we can estimatethe model parameters by:

βu,c = (nu,c + β0)[∑

c′∈C(nu,c′ + β0)

]−1, (11)

ωc,f = (nc,f + ω0)[∑

f ′∈F(nc,f ′ + ω0)

]−1, (12)

ψc,v = (nc,v + ψ0)[∑

v′∈V(nc,v′ + ψ0)

]−1, (13)

γu,r = (nu,r + γ0)[∑

r′∈R(nu,r′ + γ0)

]−1, (14)

αu,a = (nu,a + α0)[∑

a′∈A(nu,a′ + α0)

]−1, (15)

δa,v = (na,v + δ0)[∑

v′∈V(na,v′ + δ0)

]−1, (16)

τc,a,t = (nc,a,t + τ0)[∑

t′∈T(nc,a,t′ + τ0)

]−1, (17)

ρa,t,h = (na,t,h + ρ0)[∑

h′∈H(na,t,h′ + ρ0)

]−1, (18)

ηv,t,s = (nv,t,s + η0)[∑

s′∈S(nv,t,s′ + η0)

]−1, and (19)

θt,s,m = (nt,s,m + θ0)[∑

m′∈M(nt,s,m′ + θ0)

]−1. (20)

Note that µr and Σr are estimated based on Equations (6) and(7), respectively, but replacing n¬i

ri,v with nr,v .

3.4 Complexity AnalysisSpace complexity of parameter learning. To estimate the modelparameters, Algorithm 2 keeps track of the count matrices with thesize of |U |×|C| (user by community for β), |C|×|F | (communityby friend for ω, |F | = |U |), |C| × |V | (community by venue forψ), |U |×|R| (user by region for γ), |R|×|V | (region by venue forµ and Σ), |U |×|A| (user by activity for α), |A|×|V | (activity byvenue for δ), |C| × |A| × |T | (community by activity by topic τ ),|A|×|T |×|H| (activity by topic by heading word ρ), |V |×|T |×|S| (venue by topic by sentiment for η), and |T |×|S|×|M | (topicby sentiment by modifying word for θ). The total space complexityis O(|U |(2|C|+ |R|+ |A|) + |V |(|C|+ |R|+ |A|+ |T ||S|) +|C||A||T | + |A||T ||H| + |T ||S||M |). Note that the numbers oflatent communities |C|, regions |R|, activities |A|, topics |T |, andsentiment |S| are very small relatively to the numbers of users|U |, venues |V |, heading words |H|, and modifying words |M |.Thus, the space complexity linearly increases as |U |, |V |, |H|, or

Algorithm 2 Collapsed Gibbs Sampling of CRATSInput: Check-in set (ui, vi,Wi = ⟨hij ,mij⟩) and social link set

(fk, f ′k)Output: A variety of counts n1: Initialize assignments c, r,a, t, s for each social link and check-in, and

build all counts n2: for each iteration do3: for each social link (fk, f

′k) do

4: Decrease counts n for fk , f ′k , coldk , and c′oldk (i.e.,nfk,c

oldk, ncoldk ,f ′

k, nf ′

k,c′oldk

, nc′oldk ,fk)

5: Sample two communities cnewk and c′new

k based on Equations (3)and (4)

6: Increase counts n for the sampled two communities (i.e.,nfk,c

newk

, ncnewk ,f ′

k, nf ′

k,c′newk

, nc′newk ,fk

)7: end for8: for each check-in (ui, vi,Wi = ⟨hij ,mij⟩) do9: Decrease counts n for ui, vi, coldi , roldi , and aoldi (i.e.,

nui,coldi, ncoldi ,vi

, nui,roldi, nroldi ,vi

, nui,aoldi, naold

i ,vi)

10: Sample community, region and activity cnewi , rnew

i , and anewi

based on Equations (2), (5) and (8)11: Increase counts n for the sampled

community, region and activity (i.e.,nui,c

newi

, ncnewi ,vi , nui,r

newi

, nrnewi ,vi , nui,a

newi

, nanewi ,vi )

12: for each opinion phrase ⟨hij ,mij⟩ ∈Wi do13: Decrease counts n for hij , mij , toldij , and soldij (i.e.,

nci,ai,toldij, nai,t

oldij ,hij

, nvi,toldij ,soldij

, ntoldij ,soldij ,mij)

14: Sample topic and sentiment tnewij and snew

ij based on Equation-s (9) and (10)

15: Increase counts n for the sampled topic and sentiment (i.e.,nci,ai,t

newij

, nai,tnewij ,hij

, nvi,tnewij ,snew

ij, ntnew

ij ,snewij ,mij

)16: end for17: end for18: end for

|M | gets larger. Moreover, these matrices are very sparse, so thestorage can be significantly reduced by utilizing sparse matrices.

Time complexity of parameter learning. In each iteration ofAlgorithm 2, it is required to compute the probability distributionover communities for each social link, the probability distributionover communities, regions and activities for each check-in, andthe probability distribution over topics and sentiments for eachopinion phrase. Let L be the number of social links, N be thenumber of check-ins, and W be the average number of opinionphrases in a comment. Thus, for each iteration the total timecomplexity is O(L|C|+N(|R|+W (|C|+ |A|+ |T |+ |S|))).It is worth mentioning that this time complexity is achieved bymaintaining some auxiliary count vectors for the denominator ofEquations (2), (3), (4), (6), (7), (8), (9) and (10); the space costof these auxiliary count vectors is negligible compared to the totalspace complexity. Because L and N are pretty large relatively toW , |C|, |R|, |A|, |T | and |S|, the time complexity is linear withrespect to the data (i.e., social links and check-ins) size, whichensures that CRATS is scalable to large-scale data sets.

3.5 Applications

The proposed joint model CRATS in this paper can be applied ina variety of applications. For example, we can exploit CRATS todiscover the social communities and geographical regions of usersand the aspect-based opinion summarization of venues, which arestraightforwardly represented as the model parameters of CRATS.In particular, we apply CRATS to two important applications,namely, text sentiment classification and venue recommendations.

Text sentiment classification. The essential task of textsentiment classification is to predict the sentiment polarity (e.g.,positive, negative, or neutral) of a text (e.g., comments, reviews,

8

posts, and tips). Formally, given a textWu,v = ⟨hj ,mj⟩ of useru commenting v, we predict that the text belongs to the polaritys∗ ∈ S = postive, negative, neatural having the highestposterior probability:

s∗ = argmaxs∈S

Pr(s|Wu,v). (21)

Based on Bayes’ theorem, we have

Pr(s|Wu,v) = Pr(s|u, v, ⟨hj ,mj⟩)∝ Pr(s|u, v) Pr(⟨hj ,mj⟩|s, u, v), (22)

where Pr(s|u, v) is the prior probability of sentiment s, i.e.,

Pr(s|u, v)=∑c∈C

∑a∈A

∑t∈T

Pr(c, a, t|u, v) Pr(s|c, a, t, u, v)

=∑c∈C

∑a∈A

∑t∈T

Pr(c|u) Pr(a|u, v) Pr(t|c, a) Pr(s|t, v)

∝∑c∈C

βu,c∑a∈A

αu,aδa,v∑t∈T

τc,a,tηv,t,s, (23)

and Pr(⟨hj ,mj⟩|s, u, v) is the probability of the text condi-tioned on sentiment s, given by

Pr(⟨hj ,mj⟩|s, u, v)=∑c∈C

∑a∈A

∑t∈T

Pr(c, a, t|s, u, v) Pr(⟨hj ,mj⟩|c, a, t, s, u, v)

∝∑c∈C

βu,c∑a∈A

αu,aδa,v∑t∈T

τc,a,t∏j

ρa,t,hjθt,s,mj . (24)

Note that the prior and posterior probabilities contain the commonpart

∑c∈C βu,c

∑a∈A αu,aδa,v

∑t∈T τc,a,t, which needs to be

calculated only once.Venue recommendations. For venue recommendations, the

main task is to predict the probability Pr(v|u) of user u visitinga new venue v and then return the top-K new venues with thehighest visiting probability for user u. As user u has not visitedvenue v before, we do not have any check-in or comment relatedto both user u and venue v. To this end, we take full advantage ofthe correlations between user u and venue v in social communities,geographical regions, categorical activities, and positive topics toderive the visiting probability Pr(v|u), given by

Pr(v|u) =(∑

c∈C

βu,cψc,v

)(∑r∈R

γu,rN (lv|µr,Σr)

)×(∑

a∈A

αu,aδa,v

)(∑t∈T

ηv,t,s∑a∈A

αu,a

∑c∈C

βu,cτc,a,t

). (25)

Note that in Equation (25), we only utilize the positive topics ofvenues, i.e., s = positive, since in reality people are more attrac-tive to venues with aspects receiving more positive comments.

4 EXPERIMENTS

Here, we conduct intensive experiments to evaluate the perfor-mance of CRATS, compared to the state-of-the-art competitors.We present experimental settings in Section 4.1 and analyzeexperimental results in Section 4.2.

(a) Ten cities across four countries in Yelp [41]

(b) Los Angeles (LA) in Foursquare

(c) New York City (NYC) in Foursquare

Fig. 3. Geographical distribution of check-ins on the three data sets

TABLE 2Statistics of the three data sets

Yelp LA NYCNo. of users 366,000 30,208 47,240No. of venues 61,000 142,798 203,765No. of categories 712 356 354No. of check-ins 1,600,000 244,861 388,594No. of social links 2,900,000 349,985 810,672Check-in density 7.17× 10−5 5.68× 10−5 4.04× 10−5

4.1 Experimental Settings

4.1.1 Three Real Data Sets

We use three publicly available large-scale real geosocial networkdata sets that were crawled from Yelp [41] and Foursquare [42].The Yelp Challenge data set contains check-ins located in 10 citiesacross 4 countries while the Foursquare data set includes twosubsets of check-ins in Los Angeles (LA) and New York City

9

(NYC). Fig. 3 depicts the geographical distribution of check-insand TABLE 2 shows the statistics of the three data sets.

In the preprocessing, the texts of users commenting on v-enues are parsed into opinion phrases using the Stanford naturallanguage parsers [40] due to its features of open source, goodperformance, and wide usage [4], [11], [28]. To evaluate thesentiment classification performance, the texts of the Yelp dataset are labeled in terms of the associated five-star scale ratings,i.e., the rating is higher than three-star, the text is labeled with“positive”; in contrast, if the rating is lower than three-star, thetext is labeled with “negative”; otherwise, the text is labeled with“neutral”. Note that we are unable to label the texts in the twoFoursquare data sets, since they do not contain the ratings. Weevaluate all competitors using ten-fold cross-validation.

4.1.2 Evaluated TechniquesOur method CRATS is compared with the state-of-the-art com-petitors including:• LDA is the well-known Latent Dirichlet Allocation for topic

modeling [5], as a baseline.• PLDA extracts frequent phrases from texts and assigns all words

in a phrase to the same topic based on the Phrase-LDA [17].• LCTA uses Latent Community Topic Analysis to ensure the

topical coherence in social communities [6].• LGTA utilizes Latent Geographical Topic Analysis to discover

topics in geographical regions [8].• LCARS builds a Location Content Aware Recommender Sys-

tem based on LDA to infer personal interests of users and localpreferences of regions [20].

• GeoPFM is a general Geographical Probabilistic Factor Modelthat captures the geographical topics of users for venue recom-mendations [23].

• JST is a Joint Sentiment Topic model based on LDA, whichdetects sentiments and topics simultaneously from texts [10].

• FLDA is a Factorized LDA model, in which each user oritem (e.g., venues) has a topic distribution and a sentimentdistribution [11].

4.1.3 Performance Metrics(1) For generalization performance, we compute the perplexityin the test data set. Perplexity is a standard metric to evaluate thequality of probabilistic models and is monotonically decreasing inthe likelihood of the test data [5], [11], [32], [35]. Thus, a lowerperplexity means a better prediction power. Given a test set oftexts Dtest = Wi = ⟨hi,mi⟩, the perplexity is defined as

Perplexity(Dtest) = exp

−∑

i log Pr(⟨hi,mi⟩)2∑

iNi

,

where Ni is the number of words in vector hi or mi.(2) For sentiment classification, given a class (positive, neu-

tral, or negative), the precision and recall are defined as

Precision =No. of texts correctly predicted to the class

No. of texts predicted to the class,

and

Recall =No. of texts correctly predicted to the classNo. of texts actually belonging to the class

.

(3) For venue recommendations, a discovered venue is de-fined as a venue that is recommended to and visited by a targetuser. The precision and recall are defined as

Precision =No. of discovered venues

No. of recommended venues for the user,

TABLE 3Perplexity (the lower the better)

Yelp LA NYCLDA 806.58 846.64 902.13PLDA 689.66 728.42 768.33LCTA 542.38 588.23 601.35LGTA 565.34 602.38 610.44LCARS 628.38 640.23 653.89GeoPFM 605.34 632.86 645.85JST 529.04 548.32 560.24FLDA 508.38 526.64 546.72CRATS 358.18 389.66 396.56

and

Pecall =No. of discovered venues

No. of venues actually visited by the user.

4.1.4 Parameter SettingsFor all experiments, we empirically set the number of communi-ties, regions, and topics to |C| = 200, |R| = 200, and |T | = 100in terms of the popular grid search method for hyperparameteroptimization, and sentiment to |S| = 3 for the three classes ofpositive, neutral, and negative. Note that the number of activitiesis equal to the number of categories of venues in the data sets(TABLE 2). Then we can determine the Dirichlet priors as inSection 3.2. The number of iterations in Algorithm 2 is set to 500,at which the collapsed Gibbs sampling is convergent.

4.2 Experimental ResultsIn this section, we compare the performance of CRATS againstthe state-of-the-art competitors in different applications (Sec-tion 4.2.1), discuss the true sources of the superiority of CRATS(Section 4.2.2), and test the complexity of CRATS (Section 4.2.3).

4.2.1 Comparison of Methods in ApplicationsGeneralization performance. TABLE 3 depicts the perplexity(the lower the better) of the evaluated methods on the three real-world data sets. From the perplexity results, we have the followingfive observations: (1) By assigning the words in a phrase to thesame topic, PLDA performs better than LDA. However, both LDAand PLDA suffer from high perplexity because they only modelthe topics in texts. (2) By integrating community discovery intotopic analysis to ensure the topical coherence in the communities,LCTA decreases the perplexity in comparison to LDA and PLDA.(3) By modeling the topics specific to geographical regions,LGTA, LCARS, and GeoPFM are also better than LDA andPLDA but worse than LCTA; our explanation is that topics aremore dependent on social communities than geographical regions.(4) By taking into account sentiments associated with topics, JSTand FLDA lead to further improvement of perplexity compared toLCTA; the reason is that the sentimental topic models separateopinion words from aspect words in texts to distinguish theirsemantic meanings. (5) Our CRATS exhibits the lowest perplexitywhich indicates the best prediction power, since CRATS not onlycombines topic analysis with communities, regions and senti-ments, but also considers the dependency of topics on activitiesand communities.

Text sentiment classification. Fig. 4 compares the text sen-timent classification accuracy of JST, FLDA, and CRATS on

10

Positive Negative Neutral0.4

0.5

0.6

0.7

0.8

0.9

1P

reci

sion

JSTFLDACRATS

(a) Precision in YelpPositive Negative Neutral

0.4

0.5

0.6

0.7

0.8

0.9

1

Rec

all

JSTFLDACRATS

(b) Recall in Yelp

Fig. 4. Sentiment classification accuracy

the Yelp data set. Note that: the other topic models do notdeal with the sentiment and cannot be applied for sentimentclassification; the traditional supervised classification methods forlong documents are not suitable due to their poor performance onshort comments with only a few sentences [4]; the two data setsin Foursquare do not contain the ratings and cannot be appliedto evaluate the sentiment classification accuracy. According toFig. 4, we have the following three findings: (1) In general, FLDAoutperforms JST a little due to the reason: FLDA models thesentiment distribution for each topic whereas JST uncovers thetopic distribution for each sentiment; the sentiment distribution oftopics is more reasonable and intuitive than the topic distributionof sentiments, because people usually choose a topic and thenexpress various sentiments conditioned on it, rather than adoptingthe reverse order. Moreover, JST applies a domain independentsentiment lexicon that may be too general and fail to include thedomain knowledge; FLDA trains the model in the category levelinstead of the item level, which help address the clod-start problemon new items. (2) In contrast to JST and FLDA, our proposedCRATS significantly increases the sentiment classification preci-sion and recall. Our explanation is that: CRATS considers not onlythe dependency of sentiments on topics but also the dependenciesof topics on communities and activities; these communities andactivities help find the coherent topics, because the communitiesand activities play an important role on the topics. A user oftenshares more common topics with users in the same communitythan others in different communities and the category of venuesstrongly indicates the activities that users possibly perform at thevenues. The experimental result validates the superiority of mod-eling the dependencies among the communities, regions, activities,topics, and sentiments of users again. (3) It is harder to correctlypredict the “Negative” or “Neutral” class than the “Positive” class.The intuitive reason is that: (a) In the negative text, users oftenexpress their opinions using the negation or ironical languagewhich is more difficult for natural language processing tools tounderstand the real sentiment in the text. (b) In the neutral text,users nearly equally express their positive and negative opinions.As a result, the neutral text is prone to being mistakenly classifiedinto the “Positive” or “Negative” class.

Venue recommendations. In venue recommendations, theevaluated methods recommend a user with the top-K new venueswith the highest predicted visiting probability. Fig. 5 depicts therecommendation accuracy respecting the large range of top-Kvalues from 2 to 50, in which LDA and PLDA are not applicableto venue recommendations since they only consider the topicsin texts. It is worth emphasizing that: unlike the sentimentclassification accuracy, the accuracy of all venue recommendation

2 6 10 14 18 22 26 30 34 38 42 46 500

0.05

0.1

0.15

0.2

0.25

0.3

Top−K

Pre

cisi

on

LCTA

LGTA

LCARS

GeoPFM

JST

FLDA

CRATS

(a) Precision in Yelp

2 6 10 14 18 22 26 30 34 38 42 46 500

0.05

0.1

0.15

0.2

0.25

0.3

Top−K

Rec

all

(b) Recall in Yelp

2 6 10 14 18 22 26 30 34 38 42 46 500

0.05

0.1

0.15

0.2

Top−K

Pre

cisi

on

(c) Precision in LA

2 6 10 14 18 22 26 30 34 38 42 46 500

0.05

0.1

0.15

0.2

Top−K

Rec

all

(d) Recall in LA

2 6 10 14 18 22 26 30 34 38 42 46 500

0.05

0.1

0.15

Top−K

Pre

cisi

on

(e) Precision in NYC

2 6 10 14 18 22 26 30 34 38 42 46 500

0.05

0.1

0.15

Top−K

Rec

all

(f) Recall in NYC

Fig. 5. Venue recommendation accuracy respecting top-K values (allfigures with the same legend)

techniques is usually not high, because the density of a user-venuecheck-in matrix in geosocial networks is pretty low as shownin TABLE 2. This phenomenon has been repeatedly observedin previous works (e.g., [20], [23]). Instead, we compare therelative accuracy of our CRATS with the state-of-the-art venuerecommendation techniques. As depicted in Fig. 5, CRATS im-proves the recommendation accuracy as more check-in activitiesare recorded. The Yelp data set with the highest check-in density(i.e., 7.17 × 10−5, as depicted in TABLE 2) records the highestrecommendation accuracy, and the LA data set with the secondhighest check-in density (i.e., 5.68×10−5) is better than the NYCdata set, which has the lowest check-in density (i.e., 4.04×10−5),in terms of the recommendation accuracy.

11

TABLE 4Average graph density of social networks for all communities from

CRATS and LCTA

Yelp LA NYCAND AEW AND AEW AND AEW

CRATS 1.181 0.049 1.078 0.044 1.424 0.058LCTA 0.640 0.026 0.562 0.019 0.833 0.032

Based on Fig. 5, we can conclude: (1) Social topic models.By associating community discovery with topic analysis, LCTA isable to obtain the coherent topics in communities, i.e., LCTA canmodel the social interaction between users when they are visitingvenues. For example, users often check in venues which are high-ly recommended by their friends. Accordingly, LCTA generallypresents the second best recommendation precision and recall onthe three data sets. (2) Geographical topic models. In contrast,LGTA, LCARS, and GeoPFM mine the topics specific to regionsand thus are able to model the geographical mobility of users. Forinstance, indoorsy persons like visiting venues around their livingareas while outdoorsy persons prefer traveling around the world toexplore new venues. Subsequently, LGTA, LCARS, and GeoPFMare competitive to LCTA. (3) Sentimental topic models. Insteadof modeling the latent communities or regions, JST and FLDAconcentrate on uncovering the latent sentiments related to topics.Unfortunately, JST and FLDA generate the worst result, sincethey ignore the social interaction and geographical mobility ofusers which are some of the most important characteristics ofgeosocial networks. (4) The joint model. Our CRATS alwaysexhibits the highest recommendation precision and recall, becausewhen recommending users with new venues CRATS considersthe correlations between the users and venues in the social com-munities, geographical regions, categorical activities, and positivetopics which are jointly derived based on the dependencies amongthem. (5) The trends in the precision and recall. AsK increases,the precision gets lower and the recall becomes higher, becauserecommending more venues for users can discover more venuesthat the users would like to check in, but some recommendedvenues are less possible to be visited by the users. For example,the second recommended venue has the lower predicted visitingprobability than the first one.

4.2.2 DiscussionFurther, we shine light on the true sources of the improvement ofour CRATS in comparison to the state-of-the-art competitors.

Social interactions of users in communities. At first, weinvestigate how users interact with each other or what communitiesthey participate in. To this end, we examine the graph densityof a social network formed by the social links between usersin a community. We adopt two standard metrics to measure thegraph density of social networks: Average Node Degree (AND)and Average Edge Weight (AEW ) [43]. Formally, given a socialnetwork Gc = (Uc, Ec) for a community c, in which Uc ⊂ U isthe set of users (nodes) in the community c and Ec is the set ofsocial links (edges) between users in Uc,

AND(Gc) =|Ec||Uc|

and AEW (Gc) =2|Ec|

|Uc|(|Uc| − 1).

In CRATS and LCTA, a community has a multinomial distri-bution over all users. In general, a community consists of userswith the probability larger than a given threshold. To compute the

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

User IDs

Use

r ID

s

(a) CRATS

5 10 15 20 25 30 35 40 45 50

5

10

15

20

25

30

35

40

45

50

User IDs

Use

r ID

s

(b) LCTA

Fig. 6. Adjacency matrix for social links between users in a communityof Yelp from CRATS and LCTA

TABLE 5Top-10 categories of venues (i.e., activities) interested by three

communities in Yelp

Community 1 Community 2 Community 3Active Life Nightlife FoodFitness & Instruction Shopping ItalianGyms Bars ChineseSports Clubs Fashion VietnameseTrainers Arts & Entertainment MexicanSporting Goods Art Galleries SandwichesYoga Art Schools SeafoodBoating Antiques GroceryClimbing Arts & Crafts AmericanBeaches Women’s Clothing Burgers

graph density of communities, we determine users for the com-munities by setting the threshold to 0.001; a resulting communityusually contains around 50 users. TABLE 4 contrasts the graphdensity averaged on the social networks for all communities fromCRATS against LCTA. Note that the other methods described inSection 4.1 cannot discover the latent communities. According toTABLE 4, the communities in CRATS have much higher graphdensity than those in LCTA, i.e., users in a community fromCRATS are more likely to know each other. For example, Fig. 6depicts the adjacency matrix based on the social links betweenusers in a randomly selected community, in which two userscorresponding to the x-axes and y-axes value of a marker havea social link. In terms of Fig. 6, each user in the communitygenerated by CRATS nearly has at least one social friend inthe same community. In contrast, a user in the community givenby LCTA is highly possibly strange to the other users in thesame community. This experiment result shows one advantage ofCRATS compared to LCTA, i.e., CRATS can model the realitythat people are prone to participating in communities with friendsinstead of strangers.

Interests of communities on activities. Another advantage ofour method is that CRATS can discover a variety of communitieswith interests on different categories of venues, i.e., activities. InCRATS, each community also has a multinomial distribution overall venues. Further, based on the category information of venues,we can derive the distribution over categories of venues for eachcommunity. Due to similar results, TABLE 5 depicts the top-10categories of three typical communities in Yelp only; the top-10categories account for more than 80% venues interested by the

12

(a) On all cities in Yelp

(b) On Pittsburgh in Yelp

Fig. 7. Centers of regions from CRATS (Red Stars denote Centers andGreen Points denote Check-ins)

communities. From TABLE 5, users in Community 1 prefer toActive Life and they often to Fitness, Gyms, and Sports Clubs;users in Community 2 would like to participate in activities relatedto Arts & Entertainment, e.g., Nightlife, Shopping, and Bars; usersin Community 3 are foodies and they often tastes various foodfrom different countries, e.g., Italian, Chinese, Vietnamese, andMexican. This experiment result shows that our CRATS groupsusers into communities based on not only their social links but alsotheir common interests on venues, which ensures its superiority toother competitors to some extent.

Geographical mobility on regions. CRATS models the col-lective geographical mobility of users by discovering the latentregions that they often visit. As an example, Fig. 7(a) depicts thecenters of latent regions obtained from the Yelp data set by CRAT-S. The region centers locate in the ten cities that completely matchthe distribution of check-in data as shown in Fig. 3(a). Further,Fig. 7(b) shows the zoom in visualization of three region centers inPittsburgh. One center locates at the Cultural District that featuresplenty of theaters, art galleries, restaurants and landmarks, andthus attracts the most check-ins from users. Another center is atthe Strip District that consists of one-half square mile shoppingarea with ethnic grocers, produce stands, meat and fish marketsand sidewalk vendors. The third center is close to University ofPittsburgh and Carnegie Mellon University, the students of whichare very active to check in the interesting venues around. Theseobservations verify that our CRATS is able to incorporate thegeographical preferences of users in venue recommendations.

Topic interests under different activities. TABLE 6 depictsthe three most popular topics under two activities in Yelp, inwhich the topic name is assigned manually in terms of the top-10 words in the topic. As depicted in TABLE 6, people usuallytalk about topics “Fruit”, “Grocery”, and “Drink”, when theyare participating in the activity related to “Food”. Note that theword “food” appears in both “Fruit” and “Grocery” topics. In

TABLE 6Topics specific to activities in CRATS

Topics of “Food” Activity Topics of “Music” ActivityFruit Grocery Drink Type Price Timeorange sandwich coffee metal cost minutemango bread tea bong charge hourbanana croissant milk music fee tuesdaypineapple pasta vodka jazz expense yearapple dish couple band fare timecherry grocery beer rock dollar weekendgrape bean wine blues price eveningfruit tabasco drink classical ticket monthwalnut rice cafe country money weekdayfood food yogurt rap cash sunday

TABLE 7Sentiment words specific to topics in CRATS

Topic: “Fruit” Topic: “Price”Positive Negative Neutral Positive Negative Neutrallove old extensive cheap costly earlygood stale fries good pricey openfresh big smell excellent high tunenew little past free expensive watchbest cool separate great bad greengreat awesome extra awesome reluctant largeramazing bad enough low miss emptyfavorite din wish nice lost believeexcellent early single satisfy awesome availablehealthy crazy treat afford overpriced different

our opinion, this is reasonable, because food includes fruit andgrocery. In contrast, people often mention the topics, e.g., “Type”,“Price”, and “Time”, when they are performing the activity relatedto “Music”. These results show that CRATS can discover topicsspecific to activities.

Sentiment words of different topics. TABLE 7 depicts thesentiment words for topics “Fruit” and “Price”. In TABLE 7we can observe a few sentiment words that are independent ontopics, e.g., “good”, “bad”, and “excellent” in both topics. Onthe other hand, there are plenty of sentiment words specific totopics, e.g., “fresh” and “stale” of topic “Fruit”, and “cheap” and“pricey” of topic “Price”. These results confirm the importance ofdiscriminating sentiment words for different topics. Interestingly,in the context of topic “Price”, the sentiment word “awesome”is classified into both positive and negative, which is not a mis-take, since the word “awesome” can express positive or negativeopinions.

4.2.3 Computational Complexity StudyThe proposed algorithms in this paper were implemented in Javaand run on a machine with 3.4 GHz Intel Core i7 Processor and16 GB RAM. Fig. 8 demonstrates the running time of CRATSwith respect to different proportions of training data on the threereal-world data sets. Based on Fig. 8, the time cost linearly risesas the training data proportion increases, which verifies the lineartime complexity analyzed in Section 3.4. Here we do not showthe running time of other competitors mentioned in Section 4.1,because it is not fair to compare the running time of CRATSwith the competitors, none of which can simultaneously capturethe five latent variables including communities, regions, activities,

13

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

Proportion of training data

Run

ning

tim

e (s

econ

ds)

YelpLANYC

Fig. 8. Running time of CRATS on different proportions of training data

topics, and sentiments. We admit that CRATS takes more timein order to consider more latent variables than the competitors.However, CRATS shows the linear time complexity regarding thedata size as the competitors. Therefore, CRATS could be scalableto massive data sets.

5 CONCLUSION AND FUTURE WORK

In this paper, we have proposed a generative probabilistic modelCRATS to jointly mine the latent communities, regions, activities,topics, and sentiments of users by exploiting the important de-pendencies among these latent variables. Then, we have devisedan approximate learning algorithm based on collapsed Gibbssampling to estimate the model parameters of CRATS. Further,we have utilized CRATS for two important applications, i.e., textsentiment classification and venue recommendations. Finally, theexperimental results on the three large-scale real-world data setsshow that CRATS is significantly superior to the current state-of-the-art techniques.

As a part of our future work, we plan to examine moredependencies among communities, regions, activities, topics, andsentiments from geosocial network data, extend our CRATS toincorporate the time factor, and apply CRATS to new applications.

APPENDIX AINFERENCE IN COLLAPSED GIBBS SAMPLING

Here we focus on sampling communities for check-in data asan illustration example. We can use the same process to sampleother variables. To sample community ci for the i-th check-in, it is required to compute its posterior probability given theremaining latent and observed variables. At first, according to thedependencies in Fig. 2, we know that for check-in data the latentcommunity c has the dependent relations with the observed venuev and latent topic t. Then, based on Bayes’ theorem,

Pr(ci|c¬i,a, t,v, .) =Pr(c,a, t,v, .)

Pr(c¬i,a, t,v, .)

=Pr(c) Pr(v|c) Pr(a|v) Pr(t|c,a)

Pr(c¬i) Pr(v|c¬i) Pr(a|v) Pr(t|c¬i,a)

=Pr(c) Pr(v|c) Pr(t|c,a)

Pr(c¬i) Pr(v¬i|c¬i) Pr(vi) Pr(t¬i|c¬i,a¬i) Pr(ti|ai)

∝ Pr(c) Pr(v|c) Pr(t|c,a)Pr(c¬i) Pr(v¬i|c¬i) Pr(t¬i|c¬i,a¬i)

, (A.26)

in which the superscript ¬i denotes a vector (e.g., c¬i) or acount (e.g., n¬i) excluding the current assignment (the samehereafter). Further, based on the Dirichlet distribution (i.e., βu ∼Dirichlet(β0)),

Pr(c) =

∫Pr(c|β)Dirichlet(β|β0)dβ

=∏u

∫(βu,c)

nu,c1

B(β0)

∏c

(βu,c)β0dβu

=∏u

1

B(β0)

∫ ∏c

(βu,c)nu,c+β0dβu

=∏u

B(nu,C + β0)

B(β0), (A.27)

where B(·) is the multinomial Beta function, nu,c is the numberof times that community c has been sampled for user u, and nu,Cdenotes a vector on the counts for all c ∈ C. Similarly,

Pr(c¬i) =∏u

B(n¬iu,C + β0)

B(β0). (A.28)

Whence,

Pr(c)

Pr(c¬i)=

B(nui,C + β0)

B(n¬iui,C

+ β0)(only different at ui)

=

∏c∈C

Γ (nui,c + β0) · Γ( ∑

c∈C

(n¬iui,c + β0

))∏c∈C

Γ(n¬iui,c + β0

)· Γ( ∑

c∈C(nui,c + β0)

)=

Γ (nui,ci + β0)

Γ(n¬iui,ci + β0

)·∑c∈C

(n¬iui,c + β0

)=

n¬iui,ci + β0∑

c∈C

(n¬iui,c + β0

) , (A.29)

where Γ(·) is the gamma function and Γ(n + 1) = nΓ(n) hasbeen applied. For the same reason,

Pr(v|c)Pr(v¬i|c¬i)

=n¬ici,vi + ψ0∑

v∈V

(n¬ici,v + ψ0

) , and (A.30)

Pr(t|c,a)Pr(t¬i|c¬i,a¬i)

=B(nci,ai,T + τ0)

B(n¬ici,ai,T

+ τ0)

=

∏t∈T

Γ (nci,ai,t + τ0) · Γ(∑

t∈T

(n¬ici,ai,t + τ0

))∏t∈T

Γ(n¬ici,ai,t + τ0

)· Γ(∑

t∈T(nci,ai,t + τ0)

)

=

∏j


)]∏

j

[∑t∈T

(n¬ici,ai,t + τ0 + j − 1

)] ,(A.31)

in which it is important to note that the comment of the i-th check-in includes a list of topics indexing by j = 0, 1, · · · , denoted as

14

tij . Finally, we have

Pr(ci|c¬i,a, t,v, .) ∝(n¬iui,ci + β0

) (n¬ici,vi + ψ0

)∑v∈V

(n¬ici,v + ψ0

)×

∏j


)]∏

j

[∑t∈T

(n¬ici,ai,t + τ0 + j − 1

)] . (A.32)

ACKNOWLEDGEMENTS

This research is partially supported by the City University of HongKong under grant No. 7004217.

REFERENCES

[1] J.-D. Zhang and C.-Y. Chow, “GeoSoCa: Exploiting geographical, socialand categorical correlations for point-of-interest recommendations,” inACM SIGIR, 2015.

[2] ——, “Spatiotemporal sequential influence modeling for location recom-mendations: A gravity-based approach,” ACM TIST, vol. 7, no. 1, pp.11:1–11:25, 2015.

[3] J.-D. Zhang, C.-Y. Chow, and Y. Li, “iGeoRec: A personalized andefficient geographical location recommendation framework,” IEEE TSC,vol. 8, no. 5, pp. 701–714, 2015.

[4] J.-D. Zhang, C.-Y. Chow, and Y. Zheng, “ORec: An opinion-based point-of-interest recommendation framework,” in ACM CIKM, 2015.

[5] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,”JMLR, vol. 3, pp. 993–1022, 2003.

[6] Z. Yin, L. Cao, Q. Gu, and J. Han, “Latent community topic analysis:Integration of community discovery with topic modeling,” ACM TIST,vol. 3, no. 4, pp. 63:1–63:21, 2012.

[7] G. Zheng, J. Guo, L. Yang, S. Xu, S. Bao, Z. Su, D. Han, and Y. Yu,“Mining topics on participations for community discovery,” in ACMSIGIR, 2011.

[8] Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang, “Geographical topicdiscovery and comparison,” in WWW, 2011.

[9] Q. Yuan, G. Cong, K. Zhao, Z. Ma, and A. Sun, “Who, where, whenand what: A non-parametric Bayesian approach to context-aware recom-mendation and search for Twitter users,” ACM TOIS, vol. 33, no. 1, pp.2:1–2:33, 2015.

[10] C. Lin, Y. He, R. Everson, and S. Ruger, “Weakly supervised jointsentiment-topic detection from text,” IEEE TKDE, vol. 24, no. 6, pp.1134–1145, 2012.

[11] S. Moghaddam and M. Ester, “The FLDA model for aspect-based opinionmining: Addressing the cold start problem,” in WWW, 2013.

[12] Q. Wang, J. Xu, H. Li, and N. Craswell, “Regularized latent semanticindexing: A new approach to large-scale topic modeling,” ACM TOIS,vol. 31, no. 1, pp. 5:1–5:44, 2013.

[13] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negativematrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.

[14] T. Hofmann, “Probabilistic latent semantic indexing,” in ACM SIGIR,1999.

[15] X. Wang, X. Jin, M.-E. Chen, K. Zhang, and D. Shen, “Topic mining overasynchronous text sequences,” IEEE TKDE, vol. 24, no. 1, pp. 156–169,2012.

[16] X. Cheng, X. Yan, Y. Lan, and J. Guo, “BTM: Topic modeling over shorttexts,” IEEE TKDE, vol. 26, no. 12, pp. 2928–2941, 2014.

[17] A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han, “Scalable topicalphrase mining from text corpora,” PVLDB, vol. 8, no. 3, pp. 305–316,2014.

[18] Y. Liu, A. Niculescu-Mizil, and W. Gryc, “Topic-link LDA: Joint modelsof topic and author community,” in ICML, 2009.

[19] J.-W. Son, A.-Y. Kim, and S.-B. Park, “A location-based news articlerecommendation with explicit localized semantic analysis,” in ACMSIGIR, 2013.

[20] H. Yin, B. Cui, Y. Sun, Z. Hu, and L. Chen, “LCARS: A spatial itemrecommender system,” ACM TOIS, vol. 32, no. 3, pp. 11:1–11:37, 2014.

[21] T. Kurashima, T. Iwata, T. Hoshide, N. Takaya, and K. Fujimura, “Geotopic model: Joint modeling of user’s activity area and interests forlocation recommendation,” in ACM WSDM, 2013.

[22] B. Hu and M. Ester, “Spatial topic modeling in online social media forlocation recommendation,” in ACM RecSys, 2013.

[23] B. Liu, H. Xiong, S. Papadimitriou, Y. Fu, and Z. Yao, “A generalgeographical probabilistic factor model for point of interest recommen-dation,” IEEE TKDE, vol. 27, no. 5, pp. 1167–1179, 2015.

[24] L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis,“Discovering geographical topics in the Twitter stream,” in WWW, 2012.

[25] Q. Yuan, G. Cong, Z. Ma, A. Sun, and N. M.-T. Thalmann, “Who, where,when and what: Discover spatio-temporal topics for twitter users,” inACM KDD, 2013.

[26] D. M. Blei and J. D. Lafferty, “Dynamic topic models,” in ICML, 2006.[27] X. Wang and A. McCallum, “Topics over time: A non-Markov

continuous-time model of topical trends,” in Proceedings of the 12thACM SIGKDD International Conference on Knowledge Discovery andData Mining, Philadelphia, PA, 2006, pp. 424–433.

[28] S. Moghaddam and M. Ester, “ILDA: Interdependent LDA model forlearning latent aspects and their ratings from online product reviews,” inACM SIGIR, 2011.

[29] H. Wang, Y. Lu, and C. Zhai, “Latent aspect rating analysis on reviewtext data: A rating regression approach,” in ACM KDD, 2010.

[30] Y. Jo and A. H. Oh, “Aspect and sentiment unification model for onlinereview analysis,” in ACM WSDM, 2011.

[31] C. Lin and Y. He, “Joint sentiment/topic model for sentiment analysis,”in ACM CIKM, 2009.

[32] K. W. Lim and W. Buntine, “Twitter opinion topic model: extractingproduct opinions from tweets by leveraging hashtags and sentimentlexicon,” in ACM CIKM, 2014.

[33] X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang, “Entity-centrictopic-oriented opinion summarization in Twitter,” in ACM KDD, 2012.

[34] X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, “Topic sentiment anal-ysis in twitter: A graph-based hashtag sentiment classification approach,”in ACM CIKM, 2011.

[35] Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C. Wang, “Jointlymodeling aspects, ratings and sentiments for movie recommendation(JMARS),” in ACM KDD, 2014.

[36] Y. Lu, C. Zhai, and N. Sundaresan, “Rated aspect summarization of shortcomments,” in WWW, 2009.

[37] J. McAuley, J. Leskovec, and D. Jurafsky, “Learning attitudes andattributes from multi-aspect reviews,” in IEEE ICDM, 2012.

[38] I. Titov and R. McDonald, “A joint model of text and aspect ratings forsentiment summarization,” in ACL, 2008.

[39] Y. Wu and M. Ester, “FLAME: A probabilistic model combining aspectbased opinion mining and collaborative filtering,” in ACM WSDM, 2015.

[40] R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng, “Parsing withcompositional vector grammars,” in ACL, 2013.

[41] Yelp, “Challenge Data Set,” http://www.yelp.com/dataset challenge,2015.

[42] J. Bao, Y. Zheng, and M. F. Mokbel, “Location-based and preference-aware recommendation using sparse geo-social networking data,” in ACMSIGSPATIAL, 2012.

[43] A. Angel, N. Sarkas, N. Koudas, and D. Srivastava, “Dense subgraphmaintenance under streaming edge weight updates for real-time storyidentification,” PVLDB, vol. 5, no. 6, pp. 574–585, 2012.

Jia-Dong Zhang received the M.Sc. degreefrom Yunnan University, China, in 2009, and thePh.D. degree from City University of Hong Kongin 2015. He is currently a postdoctoral fellow inDepartment of Computer Science, City Univer-sity of Hong Kong. His research work has beenpublished in premier conferences (e.g., ACM SI-GIR, CIKM and SIGSPATIAL), transactions (e.g.,ACM TIST, IEEE TDSC, TSC and TITS), andjournals (e.g., Pattern Recognition and Informa-tion Sciences). His research interests include

data mining, location-based services and location privacy.

Chi-Yin Chow received the M.S. and Ph.D. de-grees from the University of Minnesota-Twin C-ities, USA in 2008 and 2010, respectively. He iscurrently an assistant professor in Departmentof Computer Science, City University of HongKong. His research interests include big data an-alytics, data management, GIS, mobile comput-ing, location-based services, and data privacy.He is the co-founder and co-chair of the ACMSIGSPATIAL MobiGIS 2012 to 2016, and theeditor of the ACM SIGSPATIAL Newsletter. He

received the VLDB “10-year award” in 2016, and the best paper awardsin ICA3PP 2015 and IEEE MDM 2009.

Documents

1 CRATS: An LDA-based Model for Jointly Mining Latent Communities, Regions, Activities ...chiychow/papers/TKDE_2016b.pdf · 2016-12-05 · 1 CRATS: An LDA-based Model for Jointly