OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for Mobile Social Networks

OmniSuggest: A Ubiquitous Cloud-BasedContext-Aware Recommendation System for

Mobile Social NetworksOsman Khalid, Student Member, IEEE, Muhammad Usman Shahid Khan, Student Member, IEEE,

Samee U. Khan, Senior Member, IEEE, and Albert Y. Zomaya, Fellow, IEEE

Abstract—The evolution of mobile social networks and the availability of online check-in services, such as Foursquare and Gowalla,have initiated a new wave of research in the area of venue recommendation systems. Such systems recommend places tousers closely related to their preferences. Although venue recommendation systems have been studied in recent literature, theexisting approaches, mostly based on collaborative filtering, suffer from various issues, such as: 1) data sparseness, 2) coldstart, and 3) scalability. Moreover, many existing schemes are limited in functionality, as the generated recommendations do notconsider group of ‘‘friends’’ type situations. Furthermore, the traditional systems do not take into account the effect of real-timephysical factors (e.g., distance from venue, traffic, and weather conditions) on recommendations. To address the aforementionedissues, this paper proposes a novel cloud-based recommendation framework OmniSuggest that utilizes: 1) Ant colonyalgorithms, 2) social filtering, and 3) hub and authority scores, to generate optimal venue recommendations. Unlike existingwork, our approach suggests venues at a finer granularity for an individual or a ‘‘group’’ of friends with similar interest.Comprehensive experiments are conducted with a large-scale real dataset collected from Foursquare. The results confirm thatour method offers more effective recommendations than many state of the art schemes.

Index Terms—Recommendation framework, group recommendation, mobile social networks, cloud-framework

Ç

1 INTRODUCTION

THE advancement in communication infrastructure andeasy access of e-commerce and mobile social network

applications, such as Amazon, Facebook, Twitter, Four-square, Instagram, and Path, have shifted the main problemof information retrieval to the filtering of pertinent infor-mation [1]. The increase in the sheer volume of data withever-growing networking of devices and Web serviceshas made it quite difficult for users in general to find andaccess relevant personalized information [1].

Recommendation systems were developed in 90s to addressthe challenges of automatic and personalized selection ofdata from diverse and overloaded sources of information[1]. These systems apply numerous knowledge discoverytechniques on users’ historical and contextual data tosuggest information, products, and services that best matchthe user’s preferences. A good example of recommendationsystems for e-commerce applications is Amazon.com, wherecustomers receive personalized recommendations on avariety of products.

In the past few years, several social networking applica-tions, such as Foursquare, Gowalla, and Google Latitude were

developed for mobile devices. These applications allowusers to perform a ‘‘check-in’’ at venues that users visit toshare experiences in the form of a feedback or tip [2], [3].Moreover, these services collect and hold huge volumesof users’ geospatial check-in data [2]. Based on the dataextracted by the mobile social networking applications,several location-based recommendation systems weredeveloped in the recent years [1], [2], [3], [4], whichrecommend venues to users closely related to their prefer-ences. A major research challenge for such systems isto generate real-time venue recommendations for a givenindividual from a massively diverse dataset of users’ historicalcheck-ins [1], [3], [4]. To generate an optimal recommendationfor an individual, the system must simultaneously consider thefollowing factors: (a) personal preferences, (b) past check-ins,(c) current context, such as time and location, and (d) collab-orative social opinions (other individuals’ preferences).

The objective of this paper is to efficiently employ theabove mentioned factors to achieve real-time, optimalrecommendations for venues. However, there are severalbarriers that negatively affect the performance of real-timerecommendation process primarily driven by the complex-ity and cost of processing the large-scale data sets [1], [2].To scale efficiently, the recommendation system requireslarge-scale computational and storage resources. Thispaper describes an approach that leverages cloud infra-structure and service-based interfaces to process, mine,compare, and manage large-scale datasets for real-timerecommendations in a scalable architecture.

Several works [1], [2], [3], [5], [6] have applied collab-orative filtering (CF) to the venue recommendation prob-lem. These CF-based venue recommendation systems work

. O. Khalid, M.U.S. Khan, and S.U. Khan are with the North Dakota StateUniversity, Fargo, ND 58108, USA. E-mail: {osman.khalid, ushahid.khan, samee.khan}@ndsu.edu.

. A. Y. Zomaya is with the School of Information Technologies, SydneyUniversity, Australia. E-mail: [email protected].

Manuscript received 30 June 2013; revised 8 Oct. 2013; accepted 16 Nov.2013. Date of publication 2 Dec. 2013; date of current version 17 Sept. 2014.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TSC.2013.53

1939-1374 � 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 7, NO. 3, JULY-SEPTEMBER 2014 401

by matching a given user’s venue check-in record with theother users stored in a user-venue check-in matrix. Theobjective is to find a subset of similar users who sharesimilar tastes and patterns on the basis of visited venuescompared to a given user. These similar users share theirjudgments and opinions on venues, and in return, thesystem provides useful personalized recommendations fora given user. However, the following unsolved problems ofthe previous work affect the performance of current venuerecommender systems:

. Data sparseness. A user may have visited only alimited number of venues, and as a result therewould be a sparse user-venue check-in matrix. Thedata sparseness causes poor calculations of thenearest neighbor set of users based on similaritywith current user, which results in the loss ofaccuracy of recommendations. Moreover, sparse-ness of matrix results into the suboptimal perfor-mance of many existing venue recommendationsystems [3] that directly apply collaborative filteringbased models on user-venue matrix. Apart fromvenue recommendation systems, the data sparsenessalso negatively affects item recommendation sys-tems, such as Amazon.com, where active users mayhave purchased below 1 percent of the items [1], [2].

. Cold start. The cold start problem in many existingCF recommendation systems [3], [7] usually occurswhen recommendations are to be generated for auser that is new to the system. This is because thesystem does not have sufficient record availablefor new user to perform similarity measures.Insufficient records results in the zero values ofsimilarity computations, which degrades recom-mendation quality.

. Scalability. The memory-based CF recommendersystems use user rating data to apply simplisticapproaches of computing similarity between usersor items (such as neighborhood based CF [7], [8], [9]).However, such systems also suffer from scalabilityissues, as they need to parse thousands of usersat real-time in user-venue matrix that is neitherefficient nor scalable. To address the scalabilityissues, a few proposals applied model based CF.The model-based approaches apply data mining andmachine learning algorithms to find patterns basedon the training data to reduce the size of the user-item rating matrix [1], [2]. However, there is aninherent tradeoff between reduced dataset size andrecommendation quality. If a dataset is reduced forfast online processing, then it may result in the lossof recommendation quality.

The immediate repercussion of the above listed issues isthe suboptimal performance in CF-based recommendationmodels. Therefore, it may not be feasible to solely usememory-based CF model for venue recommendations.

We propose a novel hybrid cloud-based venue recom-mendation framework, OmniSuggest, which combinesmemory-based and model-based approaches of CF on acloud framework to generate optimal recommendations.To address the problems of data sparseness and cold start,

our framework utilizes a model-based Hyperlink-InducedTopic Search (HITS) [10] approach to select popular venuesfor each category (e.g., Food) under multiple levels ofhierarchies (e.g., Asian Food-Chinese Food). Such a meth-odology enables our proposed cloud-based OmniSuggestframework to generate recommendation for a new userthrough the collaborative opinion of experienced users (hubs),by computing (memory-based) similarities in preferencesof new user and the experienced hubs.

Apart from recommendations for an individual user, wepropose a method to generate venue recommendations fora group of users or friends sharing a common interest. As anexample, a group of friends may require the recommen-dation for ‘‘Chinese Food’’, and they want to attend dinnertogether. Moreover, unlike the existing systems [4], [5], [6],[7], [11], [12] when generating recommendation for a wholegroup, our system also considers the real-time effect ofvarious parameters, such as distance of each group memberfrom a set of top venues, the road traffic conditions, andother obstacles that may be encountered in reaching avenue. To reduce the computational cost during real-timeprocessing, and ensure its 24 � 7 omnipresence, the cloud-based OmniSuggest framework follows Software as aService (SaaS) approach through a modular service basedarchitecture. One of the major advantages of this approachis that the OmniSuggest framework can scale on demand asadditional virtual machines are created and deployed. Insummary, the contributions of our work are:

. A recommendation framework is presented thatcombines social computing, and recommendationmodules, on cloud infrastructure, to ensure scalabil-ityintermsofprocessing,storage,andparallelization.We combine the model-based and memory-based CFalgorithms into a hybrid approach that significantlyimproves the recommendation accuracy comparedto previous venue recommendation algorithms.

. To resolve the issues associated with data sparse-ness and cold start, the proposed framework modelsthe users’ data by utilizing HITS method to extractexperienced users and popular venues for multiplecategories. A variant of Ant colony algorithm isapplied to generate a set of venues for a user.

. The cloud-based OmniSuggest framework performsgroup recommendations by using a combination ofcollaborative filteringandgroupsatisfactionprinciple.The group satisfaction mechanism is implemented asto depict a Service Level Agreement (SLA) between theOmniSuggest framework and the end users. The SLAensures the provision of on time, high quality recom-mendations, proportionate to the real-time changes(such as, traffic conditions) that occur when groupmembers move towards recommended venue.

. We have carried out experiments on our internalUbuntu cloud setup running on 96 core SupermicroSuperServer SYS-7047GR-TRF systems. The experi-ments are conducted on real-world dataset fromFoursquare.

The remainder of this paper is organized as follows:The system architecture is described in Section 2. InSection 3, we present the model for individual and group

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 7, NO. 3, JULY-SEPTEMBER 2014402

recommendations. Section 4 presents the experimentationresults. Related work is discussed in Section 5, and Section 6concludes the paper with a summary and a description of thefuture work.

2 SYSTEM ARCHITECTURE

Most of the existing recommender systems are based oncentralized architectures [1], [2], [3], [8], [9], [12]. Suchsystems are not scalable enough to handle large volumesof geographically distributed data. The increasing numberof subscribers in mobile social networks puts forth newchallenges for centralized systems. Such systems mustsimultaneously consider a user’s preference, social context,and past actions when generating online recommenda-tions. Therefore, to address the scalability issues, we utilizea decentralized cloud-based approach.

2.1 Major ComponentsThe following are the major components of the proposedcloud-based framework:

. User profiles. The OmniSuggest framework maintainsusers’ profiles that contain information about thevenues visited by users. Venues are categorizedinto various types based on the dataset analysis oflocation-based services, such as Foursquare andGowalla. For example, in Fig. 1, the parent category‘‘Food’’ has two sub-categories at level-1: A (e.g.,Asian Food) and B (e.g., American Food). CategoryA is further having three sub-categories at level-2: A1

(e.g., Chinese Food), A2 (e.g., Thai Food), and A3 (e.g.,Indian Food). The categories at Level-1 and Level-2are associated with venues. The arrows from usersindicate the number of check-ins performed byusers at various venues. The cloud-based OmniSug-gest framework (Fig. 2) maintains the categoriesup to two levels to ensure the finer granularity ofinformation. Moreover, each check-in record has thefollowing fields: (a) user identification, (b) venue nameand identification, (c) venue location (GPS location,city, and country), (d) time at which user performedcheck-in at a venue, and (e) parent and sub-categoriesa venue is associated with. User profiles are geograph-ically distributed on the basis of cities. As depicted inFig. 1, for each geographic region, the frameworkmaintains a record of the sets of venues checked-in byusers under each category in the hierarchy.

. Top-K Users and Venues. The proposed frameworkemploys HITS method [10] on user profiles to generateexperienced users and popular venues for multi-levelcategories under each parent category. The HITSapproach gives higher popularity ranking to a user if(s)he visits a set of venues that are most frequentlyvisited. Similarly, a venue is ranked higher, if it is most

Fig. 1. Venues are linked with various categories at multiple levels. Thelower half indicates users who have performed check-ins at venues. Avenue may be linked with multiple categories.

Fig. 2. A top level architecture of the Cloud-based OmniSuggest framework.

KHALID ET AL.: CLOUD-BASED CONTEXT-AWARE RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 403

frequently visited by experienced users. As an exam-ple, the framework maintains user-venue popularitysets for categories: ‘‘Chinese Food’’, ‘‘Asian Food’’, orjust ‘‘Food’’. The framework also computes similaritygraphs among experienced users in various categories’hierarchies. The similarity graphs computed duringoffline phase are later utilized during online recom-mendation phase. The HITS-ranking for users’ andvenues’ is stored in the framework’s geographicallydistributed databases. Such a methodology furtherhelps distribution and parallel execution of processingtasks on cloud framework as each place has local sets ofusers and venues.

. Recommendation Module. The recommendation mod-ule performs parallel execution of venue recom-mendation requests generated by an active user or agroup of friends. The recommendation request queryconsists of current time, GPS location, and categoryfor which the active user requires top-N venuerecommendations. For a given category type, therecommendation framework pulls out similaritygraph of top-E experienced users, where E is thenumber of experienced users. A modified version ofAnt colony algorithm and collaborative filtering isthen applied to generate an optimal solution in theform of venues that best match an active user’spreference. While generating the ranking of venuesfor the group, the recommendation framework alsotakes into account the effect of real-time factors (speed,distance, and road-traffic conditions). Therefore, toensure the SLA is properly abided by the OmniSuggestframework, the venue at top of the ranked list will bethe one that satisfies all of the group members.

. The OmniSuggest framework runs HITS method andcomputation of experienced users’ similarity graph asperiodic batch processing jobs on users’ profiles. Thesejobs are meant to refine data through preprocessingand prune the insignificant entries. Moreover, suchjobs can be scheduled to run during off-peak loadhours in various geographical regions to reduce un-necessary computational burden on the cloud nodes.

2.2 Cloud Services MappingAs reflected in Fig. 3, OmniSuggest framework follows aSaaS approach through a modular service based architec-ture. The SaaS forms the top layer of the cloud stack, offering

real-time personalized recommendations to a user or groupsof users, while abstracting underlying implementation details[13], [14], [15]. Users access the service using thin clients, suchas mobile devices, and are typically unaware of the physicallocale of the hosted service. The framework ensures that theSLA is maintained by generating real-time context-awarerecommendations that increase the global satisfaction ofgroup members on the recommended venues. Moreover, theconfiguration allows the framework to scale on demand asadditional virtual machines are created and deployed tohandle the sporadic requests from the users.

3 PROPOSED RECOMMENDATION FRAMEWORK

In this section, we discuss in detail the proposed cloud-based venue recommendation framework, OmniSuggest.In terms of functionality, OmniSuggest framework has twomain modules: 1) an offline processing module and 2) anonline recommendation module. The offline processingmodule runs periodic jobs to pre-process the check-in data.Data pre-processing involves two phases: 1) popularityranking of users and venues, and 2) similarity graphcreation among popular users. The online recommendationmodule is responsible for generating the recommendationsfor an individual user or a group of friends. The detailedfunctionality of the above mentioned modules is discussedin the following subsections. We use the following notationsin rest of the paper:Gc (hubs similarity graph for category c),N (number of venues recommended),V (set of all venues),U(set of all users), and vix (number of check-ins of a user i toa venue x), and vi (total number of check-ins of user i).

3.1 Offline Preprocessing

3.1.1 User Venue Popularity RankingThis subsection presents the methodology of assigningpopularity ranking to users and venues for various categoryhierarchies in a geographic location. The HITS [10] mech-anism is utilized to perform the ranking for producing a setof experienced users and popular venues. In Fig. 1, supposewe want to calculate the hub and authority scores forcategoryA under the parent category Food. First, we need tocreate a user-venue matrix for category A. Let the matrix berepresented as MA, having U rows and V columns. Let ½hA�and ½aA� represent the hub and authority score matrices,respectively for a category A. The following formulascompute the hub and authority scores [10].

aA ¼MTA � hA (1)

hA ¼MA � aA: (2)

If we use ahniA and h

hniA to represent the hub and authority

scores at nth iteration, then following are the equations forgenerating the hub and authority scores.

aGn9A ¼ MTA �MA

� �� aGn�19

A (3)

hGn9A ¼ MA �MTA

� �� hGn�19

A : (4)

The insight into using the HITS method is to generate a subsetof users, who have higher experience of visiting popularvenues, and a subset of venues, that are being frequentlyvisited by the experienced users. We call such subsets as

Fig. 3. OmniSuggest framework’s cloud services mapping.


popular authorities and experienced hubs. The hub andauthority scores are computed as batch processing jobsseparately for each of the individual category. Therefore, thescores, and the iterations ðnÞ, vary from category to category.The following are the number of iterations required toconverge the scores for the sample categories presented here,as an example: American Food: n ¼ 56, Chinese Food: n ¼ 51,and Thai Food: n ¼ 945. We do not store users/venues withvery low HITS scores in the database of experienced users andpopular venues. This helps in avoiding unnecessary computa-tions during the online recommendation.

3.1.2 Hubs Similarity Graph CreationThis phase creates similarity graphs among experiencedusers (hubs) under the various predefined categories. Theidea is to generate a network of like-minded people whoshare the similar preferences for various venues they visitin a geographical region. The graphs constructed in currentphase will be made available for online recommendationprocess that utilizes a variant of Ant colony algorithm tofind an optimal path on the graph. Such a path carries acollective opinion about venues by experienced users whoare also most similar to an active user.

The similarity computation between two users in thehub similarity graph is performed by applying the PearsonCorrelation Coefficient (PCC) [1]. The value of PCC rangesbetween �1 and þ1. Positive values indicate that thesimilarity exists between two users, with highest similarityat 1, whereas negative PCC values means the choices ofthe two users does not match. PCC is computed by usingthe following formula.

simði; jÞ ¼P

x2Sijðvix � viÞðvjx � vjÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPx2Sijðvix � viÞ

2Px2Sijðvjx � vjÞ

2q ; (5)

where

Sij ¼ x 2 V jvix 6¼ 0 ^ vjx 6¼ 0� �

:

In (5), the similarity between two users i and j is computedonly for venues that are visited by both of the users.Moreover, an edge is created between the two users in thegraph, if their PCC value is positive. Considering only thepositive PCC values may result into a very sparse similaritygraph among the experienced users, when the user-venuecheck-in matrix is already very sparse. To address thesparseness issue we augment similarity computation withthe conditional probability. The conditional probabilitycomputes the likeliness that a venue will be visited by bothusers i and j. Moreover, it depicts the amount of interest(or confidence) showed by both users in venues commonlyvisited by them. The following equation is defined tocalculate the weight of an edge between two users.

!ij ¼simði; jÞ if simði; jÞ 9 0

otherwiseP ½vi\vj�P ½vj� �

11þP

x2Vjjvix�vjxj

P ½vj� 6¼ 0;

8><>: (6)

where Vj is the set of venues checked-in by user j. In (6),positive values of similarity are given preference overconditional probability. Moreover, denominator value ofthe conditional probability postulates the edge between

two users to be a directed edge. The additional sum factorin denominator is to decrease the value of conditionalprobability to keep it lower than similarity. The region-wise similarity graphs among the experienced hubs forvarious categories are stored in the database for the onlinerecommendation process.

3.2 Online Recommendation for Single UserIn this subsection, we present the online recommendationframework that applies a variant of the Ant colony approachon a graph of experienced users (hubs) to generate a set of themost popular venues not previously visited by an active user.Most of the popular collaborative filtering techniques, such asthe ones we used for evaluations, are greedy based [1], [2], [4],[5], [9], [16]. Intuitively, the greedy based heuristics are veryefficient, but they may not be very effective. The mainlimitations of such approaches are that they providerecommendations solely based on the opinion of the userswho are most similar to the current active user. However, auser who is most similar to the active user may not havenecessarily visited most of the new venues that we want torecommend to the active user. To address such limitation, weused a metaheuristic that had the capability of backtracking.For that purpose, we chose the Ant colony approach, wherewe applied a pheromone update strategy that iterativelyupdates pheromone values of the graph edges. As theiterations proceed, the pheromone concentration increaseson the edges that lead towards the nodes that are not only themost similar nodes to the active user, but also providemaximum contribution of the venues that needs to berecommended to the active user. Algorithm 1 illustrates theprocedure of the online recommendations.

1. Initializations (Line 1-Line 3):

. The algorithm takes as input the following para-meters: 1) identification of the active user, 2) thecategory for which the active user wants recom-mendations, and 3) geographical region where theuser is currently located. For example, an active user‘‘S’’ is interested in ‘‘Chinese Food’’ and located inthe New York City. In Line 1, various datastructures used by the algorithm are initialized.The graph of experienced hubs for the specifiedcategory and geographic region is retrieved fromthe database in the Line 2. In Line 3, the links of theactive user are created with the subset of graphnodes ðNaÞ based on the similarity formula (5).

2. Iterative solution construction (Line 4-Line 29):

. The algorithm increments the iteration counter ðtÞafter creating the ants, and inserts the entry of theactive user in the tabu-list of ant k (Line 4-Line 6).

. The neighbor nodes ðNaÞ are traversed in thedescending order based on the existing phero-mone quantity on the links, multiplied by theedge count between the active user and neigh-boring node (Line 7-Line 8).

. On the traversal, only those venues are collectedfrom the neighboring nodes that were notpreviously visited by the active user (Line 9).


. The collected venues are appended in a matrixðMÞ (Line 10). The visited neighbor, as well as,the pheromone on the edge is stored in therespective lists (Line 11-Line 12).

. The Line 14 checks whether or not the requirednumber of venues have been collected. Two casesmay arise: 1) venue count has reached N and 2)required venue count has not been achieved.

a) If the venue count is achieved, then the controlparses the Line 25, where the pheromone isupdated on the graph edges (discussed subse-quently). After that, it is checked whether ornot, the maximum number of iterations has beenreached (Line 26). In the case when the currentiteration count ðtÞ is less than the maximumallowed iterations ðtmaxÞ, then the data structureswill be reset (Line 27) and the control jumps toLine 4. Otherwise, if the test condition at Line 26is false, then this means that the maximumnumber of allowed iterations is completed, andthe venues are ranked using the aggregationfunction (discussed later subsequently).

b) If the required venue count is not achieved, thenthe control jumps to Line 17, where a node isselected amongst the neighbor set ðNaÞ. Thecriterion for the node selection is that themaximum pheromone must be deposited onthe link towards selected node, and the selectednode has the maximum number of venuesavailable for the active user. If no such node isfound, then this means that the ant has reachedthe terminal node of the graph. Subsequently,the control will jump to Line 25. Otherwise, theselected node will be set as a new temporaryactive user ðaÞ and appended in the list (Line21). Moreover, the edge count will also beincremented by one in Line 21. From Line 22,the control will jump back to Line 7, and theprocedure will be repeated iteratively until themaximum number of iteration limit is reached.

3. Pheromone Update (Line 25):

. The pheromone is updated in two steps: 1)evaporation and 2) deposition. Evaporation isperformed equally on all the edges of the graph.However, only the edges leading to the nodesthat provide the required venues are depositedwith the pheromone. The pheromone depositdepends on:

1. the existing quantity of pheromone betweenthe node j and active user s,

2. number of venues returned by the node j,3. the hop distance between the node j and the

user s, and4. average check-ins of node j at venues

retrieved in current iteration.

. For the iteration t, the pheromone is evaporatedon each edge at a rate, given by ð1� pÞ�

�ijðt� 1Þ, where � is a constant that representsthe evaporation rate. The amount of pheromonedeposited on an edge is given as:

DDsjðtÞ ¼Q�sj

r¼1 �r;rþ1ðt� 1Þ�sj

� ZjN

�P

x2S0 vjxPu

Px2S0 vux

;

whereQ�sj

r¼1 �r;rþ1ðt� 1Þ represents the productof pheromone deposited on the edges betweennode s and node j that are �sj units apart,S0 ¼ fVjnVsg, and u 2 tabuk. The parameter

ZjN

indicates the ratio of the number of venuescontributed by a node j, to the total number ofthe required venues ðNÞ. The right most term inmultiplication indicates the average number ofcheck-ins performed by the user j (per iteration)at venues not visited by an active user s. Forevery iteration, the values of the pheromone attime ðt� 1Þ, and the nodes that were selected astemporarily active user ðaÞ at each level of thegraph, are stored in the data structure edgeskand levelk. The aforementioned data structuresprovide the necessary values required to updatethe pheromone in the current iteration. There-fore, the aggregate quantity of pheromoneupdated at the end of iteration t is given by:

�ijðtÞ ¼ ð1� �Þ � �ijðt� 1Þ þ DDsjðtÞ: (7)

4. Aggregate venues provided by the best nodes(Line 30):

. On completion of tmax iterations, the venues areranked and sorted in the descending order togenerate top-N venues to be recommended tothe active user. The following equation is usedto rank the venues.

Rankx ¼P

u �suðtÞ � vuxPu �suðtÞ

: (8)

In (8), x is the venue to be ranked, the parameters is the active user node, and vux is the number ofcheck-ins performed by user u at venue x. Theparameter �suðtÞ represents the quantity ofpheromone between node s and user u aftertmax iterations.

It is noteworthy to mention that by using (8), we canobserve that the quantity of pheromone accumulated onthe edges after multiple iterations has a significant effect onthe ranking of the venues.

Algorithm 1. Ant Colony based Venue Selection

Input: Active user: s, category: C, region: R

Output: A set V 0 of top-N venues visited by experiencedhubs similar to active user.

Definitions: t ¼ Current Iteration, tmax ¼ maximumiterations, Nj ¼ neighbor set of node j, �ði; jÞ ¼ simði; jÞ


if �ij ¼ 1 ^ i ¼ s, otherwise, �ði; jÞ ¼ !ij (from (6)), where�ij ¼ edge count between i and j, �ði; jÞ ¼ 1=�ij and, Zj ¼number of required venues found at a node j.

1: t 0; a s; � 1; levelk �; edgesk �2: Gc getHubSimGraphðC;RÞ3: Na fx : Gcjsimða; xÞ 9 0g4: k jNaj number of ants5: t tþ 16: tabuk a7: Sort Na in terms of ½�ða; jÞ � �ðs; jÞ�, j 2 Na (descending)8: for each e 2 Na do9: S fv : Vejv 62 Vag

10: M M:appendðe; SÞ11: tabuk tabukUfeg12: edgesk edgeskUf�ða; eÞg13: end for14: if venueCountðMÞ � N then15: go to Line 2516: else17: 8j 2 Na, select a j, such that we have

arg max �ða; jÞ � �ðs; jÞ � ZjN

� �^Nj 6¼ ;^

8g 2 Njjg 62 tabuk18: if No any such node found in Step 17 then19: go to Line 2520: else21: � � þ 1; levelkappendðaÞ22: go to Line 723: end if24: end if

25: evaporate_deposit_Pheromone()26: if � � Tmax then27: Reset tabuk, levelk, M and set � 128: go to Line 429: else30: V 0 ¼ aggregateðMÞ31: end if32: return V 0

3.2.1 An Illustrative ExampleSuppose that by using Algorithm 1, we have to recommendten venues to an active user s under a specific category.The venues to be recommended are the ones not previouslyvisited by the active user. As a first step, the graph ofexperienced users under a given category will be retrievedfrom the database as depicted in Fig. 4a. The similarity ofthe active user will be computed with all of the nodes inthe graph. New links will be created between the activeuser and only those graph nodes for which the similarity isgreater than zero. By setting the active user s as the rootnode, the Breadth First Search (BFS) procedure will beapplied to place the immediate neighbors of the active user(for which similarity is greater than zero) at a distance ofone ð�sj ¼ 1Þ, as depicted in Fig. 4b. At a distance of twoð�sj ¼ 2Þ the neighbors of the friends of the active user willbe placed, and the process continues until the whole of thegraph is traversed. As indicated in Fig. 4b, each edge has aweight, and the edges connecting the nodes at the samelevel of the distance are intentionally labeled blank due tothe fact that they are not traversed during the execution ofAlgorithm 1. Suppose that the ten venues that we want torecommend to the active user are shown as column labelsin Table 1. The entries in each of the columns reflect thenumber of check-ins performed by the hub users at thevisited venues. The last column indicates the total numberof venues visited by each of the hub user out of the requiredten venues. On the execution of Line 8-Line 13 in Algorithm 1,the ant collects the venues from the neighbors of the activeuser. The number of venues collected from each of theneighbor are e5ð1Þ, e4ð3Þ, and e1ð2Þ. (Despite the fact that e5 isthe most similar to the active user, the node e5 was able toproduce only a single venue.) The execution jumps to Line 17,where the best of the visited neighbor nodes is selected basedon the pheromone value, and the number of venuescontributed by the neighbor node. Consequently, we get the

Fig. 4. (a) Hubs similarity graph retrieved from database, and(b) connectivity of active user with hub similarity graph.

TABLE 1Number of Times Required Venues are Visited by Each Hub

User and Total Check-Ins at the Venues


following values for each neighbor: e5½0:9� 1� ð1=10Þ ¼0:09�, e4½0:7� 1� ð3=10Þ ¼ 0:21�, and e1½0:3� 1� ð2=10Þ ¼0:06�. Here, e4 will be selected as the new root ðaÞ nodebecause of having highest the value. Line 8-Line 13 will beexecuted again and the venues collected from the neighborsare: e6ð2Þ and e3ð3Þ. On the execution of Line 17, thefollowing values are obtained for each of the neighbor:e6½0:8� ð1=2Þ � ð2=10Þ ¼ 0:08� and e3½0:5� ð1=2Þ � ð3=10Þ ¼0:075�. Therefore, e6 will be selected as a new root node ðaÞ.After Line 8-Line 13 are executed, the venues collected fromthe neighbors are e7ð3Þ. The node e7 does not have anyfurther neighbors (Line 17). Therefore, the condition of Line18 will become true and the control will jump to Line 25,where the pheromone values on the edges will be updated.The process will be restarted from the actual root node ðsÞ,and will be repeated for tmax iterations. Table 2 shows theupdate of the pheromone values after the first iteration. Itcan be observed that after the first iteration is completed, thepheromone on the edge e4-e6 has been decreased; whereas,the pheromone value is increased on the edge e4-e3.Therefore, after several iterations ðG tmaxÞ, the current paths-e4-e6-e7 will be replaced by new paths s-e4-e3-e7 ands-e4-e3-e8, which will also increase the number of venuescollected for the active user.

3.3 Group RecommendationThe existing work on venue recommendation systems,generally, focuses on recommending venues to individualusers based on personal preferences [1], [2]. However, theprocess becomes quite challenging when the system mustprovide recommendations to a group of friends [11], [17]. Toachieve an optimal level of satisfaction for the whole group,the recommendation system must be able to conglomerate theconsent of all of the group members [23]. However, it is not asimple task to produce recommendations that satisfy everygroup member, as an individual’s context (e.g., speed,distance, and road traffic conditions) may vary with time [11].

To address the above mentioned issues, we propose amore efficient and effective approach for group recommen-dations. The proposed cloud-based OmniSuggest frameworkalso takes into consideration the effect of various real-world

time-varying factors, such as speed, distance, and trafficconditions, on the group recommendations. In the followingsubsection, we present a motivational example that highlightsthe problems faced by a ‘‘traditional’’ venue recommendationsystem which generates recommendations without consider-ing the aforementioned real-world factors (e.g., speed,distance, and road traffic conditions). Later, we present atechnique to circumvent the anomalies associated with thetraditional venue recommendation systems.

3.3.1 A Motivational ScenarioSuppose, a group of five friends are at different parts of a cityand they decide to get together for dinner. They plan to meet ata Chinese restaurant. One member of the group, known asgroup leader, initiates a group query that consists of: 1) deadlineby which they must arrive at the venue, and 2) identificationsof group members. The recommendation system recommendsa venue for the category ‘‘Chinese Food’’ based on thepopularity ranking calculated for the venue (Section 3.1.1)within the geographical region [11], [23]. Out of the five, twogroup members are located far away from the recommendedvenue, and are unwilling to undertake a long journey. Onemember is stuck in a traffic jam on the road leading to thevenue and is unable to reach before the deadline. Only theremaining two members will be able to reach the venue beforethe deadline. However, they also drop their plan on findingout that the other members are not coming. Despite thatrecommended venue is most popular the system still could notsatisfy the whole group as each member had time-varyingcontext that subsequently developed into constraints.

The above scenario depicts our motivation behindconsidering real-world parameters (e.g., speed, distance,and road traffic conditions) in the venue recommendationprocess. When generating recommendations, our proposedframework not only considers the highly ranked venue(using HITS method) for a specific category, but also takesinto account the current context of each of the groupmember. In this fashion, a venue is reported on top of alist by considering the popularity and the mutual consentof the group members. In the following text, we present ourgroup recommendation approach.

TABLE 2Pheromone Update on Edges


3.3.2 Proposed Group Recommendation ApproachFormally, we state the group recommendation problem as:‘‘Given a list of venues V in a geographic region R, and a givencategory C, recommend a venue to a group, such that, itmaximizes the group’s satisfaction, under each member’sindividual context.’’ Algorithm 2 presents our proposedapproach that is further illustrated using Fig. 5

1. Initializations (Line 1 and Line 2). The algorithm takesas input the group query sent by the group leader,which includes the list of group members, deadlineby which members must reach the venue of a givencategory, and the geographical region (Fig. 5, Step 1).Line 2 of the algorithm retrieves a set of top venuesbased on their pre-calculated HITS score from thedatabase. The top venues retrieved are for the givencategory and geographic region (Fig. 5, Step 2 andStep 3).

2. Real-time processing for each member (Line 3-Line 8). Inthis step, the recommendation framework collects eachgroup member’s context, such as his/her currentlocation, distance, and road condition leading to thetop venues. In the next step, for each member, theapproximate time in reaching the top venues iscalculated (Line 3-Line 5). This approximate time issubtracted from the deadline ðJÞ, and then multipliedwith number of check-ins of a member m at a venue v,and authority scoreAv of the venue v (Line 5 and Line 6).The multiplication result is stored in the matrix ½P �m�v.If the parameter Tmv at Line 5 turns out to be lesser than0, then the user will not be able to reach the venue withinthe deadline period, and that venue is dropped.

3. Venue recommendation based on group satisfaction(Line 9-Line 10). In this step, venues are ranked by

aggregating the values of matrix ½P �m�v. The followingaggregation functions are utilized [11]: Average, LeastMisery, Most Pleasure, and Approval Voting (see Step 6,Fig. 5). At time interval T , a venue gets a top ranking,if it satisfies every group member. For example, if fora venue, most of the group members have higherratings in the matrix ½P �m�v, and we utilize theAverage aggregation function, then the venue wouldbe recommended as the best possible venue.

Algorithm 2. Group Recommendation

Input: Group query GR, threshold time J category C,region R

Output: Recommended venue for the group.1: Q Retrieve group members from Ga2: V 0 getTopVenuesðC;RÞ3: for each member m 2 Q do4: for each venue v 2 V 0 do5: Tmv ¼ J � jLocm�Locvj

1þSpeedm�Road Condmv

6: P ½m�½v� ¼ Tmv � vmv �Av; if Tmv 9 00; otherwise

7: end for8: end for9: Rank½v� aggregateðP Þ

10: return maxðRankÞ

A venue that is placed on top of the recommended list at timeT , may not stay at the same position at time interval ðT þ DtÞ,where Dt is the increment in the initial time ðT Þ. Therefore, toensure that the SLA is maintained, the OmniSuggest frame-work generates a second recommendation only in adverseconditions, when a few group members are unable to reachthe recommended venue before the deadline. Such adverseconditions may arise due to road blocks and/or severeweathers. In this way, the framework combines venues’popularity, group members’ individual preference to a venue,real-time conditions, and mutual consent of all of the groupmembers to generate the venue recommendations.

3.4 Time ComplexityIn this section, we present time complexity analysis ofOmniSuggest framework. We compute the time complexityof offline pre-processing tasks, as well as online recom-mendation modules of Algorithm 1 and Algorithm 2.

3.4.1 Offline Time Complexity of HITS MethodThe OmniSuggest framework utilizes the HITS approachto rank the popular venues and experienced users for eachcategory in a geographical region. The time complexity ofHITS method is OðmÞ½�ðh02 þ vÞ�, where the parameter mis the number of iterations required by the HITS methodto converge,h0 represent the total number of users in a region,and v is the number of venues in a category. For a totalof l categories, the time complexity isOðl�m� ½ðh02 þ v�2ÞÞ.Moreover, the computations of similarity graphs amongexperienced users h, for each category, takes Oðl� h2ÞTherefore, the total time complexity for offline pre-processingis Oðl� ððm� ½h02 þ v�2ÞÞ þ h2ÞÞ. Here, as we have h0 � h,so we compute the complexity as Oðl�m� ½ðh02 þ v�0ÞÞ.

Fig. 5. A step-wise procedure for group recommendations.


3.4.2 Time Complexity Analysis of Algorithm 1The Line 3 of Algorithm 1 computes the active user’ssimilarity with a set of experienced users. The complexityof the similarity function for v venues is OðvÞ Therefore,total time complexity of Line 3 is Oðh� vÞ. The worst-casescenario is that an active user’s similarity evaluates to begreater than zero for all h experts. The creation of k ants(Line 4) takes OðkÞ. The Line 25 updates pheromone trailwith complexityOðhÞ. The Line 7 takes a time complexity ofOðh� log½ðhÞ�Þ to sort the h experts (neighbors) with respectto their pheromones trails. In the worst case, Line 8-Line 13,number of iterations is h, and the Line 7-Line 24 arealso iterated for h times, unless the stopping criterion is met(Line 26). Time complexity for both the pheromone updatephase (Line 25) and aggregate function (Line 30) is OðhÞ.We conglomerate the overall time complexity of Algorithm 1to be as Oðh� ðvþH � log½hÞÞ:� The number of antsgenerated at Line 4 equals to the number of neighbors.Therefore, we use k ¼ h in calculating time complexity. Thecomplexity of sequential execution of Algorithm 1, withoutconsidering the parallel execution of ants, is given as:

Oðh�vþ3h2þh2�loghþ 2hÞ¼Oðh�vþh2�loghÞ¼O h� ðvþ h� loghÞð Þ:

By executing the ants in parallel, the time complexity ofthe Algorithm 1 is further reduced to Oðh� ðvþ log½hÞÞ:�

3.4.3 Time Complexity Analysis of Algorithm 2The time complexity of Line 3-Line 8 is Oðg � vÞ, wherethe parameter g is the number of group members, and vis the number of venues. The complexity of Line 9-Line 10is OðvÞ. The overall complexity of Algorithm 2 in thesequential case is Oðg � vÞ In parallel case, the complexityof Algorithm 2 is reduced to OðvÞ.

From the above analysis, we can deduce that significantspeed up is achieved by the underlying cloud infrastruc-ture that facilitates elastic parallel executions. Therefore,when the user volume is high, more number of cloudnodes can be deployed to scale-up the performance, andconversely, scale-down.

4 PERFORMANCE EVALUATION

In this section, we perform the experimental evaluationof the proposed cloud-based OmniSuggest framework. Forthe comparison purposes, we selected the followingexisting recommendation techniques (defined in the nextsubsection): (a) Popularity based ranking [9], (b) Social-based ranking [9], (c) User-based collaborative filtering(UBCF) [1], [2], and (d) SVD matrix factorization [2].

4.1 Related Recommendation Techniques

. Popularity based ranking approach assigns popularityto venues depending on the number of check-ins.The rank rj for a venue is computed as rx ¼

Pi2U vix,

where vix is number of visits of a user i to a venue x.. Social-based Ranking computes the venue popularity

ranking by utilizing social network profiles of users.For a given user, the popularity of a particular venue

depends on the number of check-ins performed byfriends [9].

. User-based Collaborative Filtering (UBCF) methods,such as k-Nearest Neighbor (k-NN) [2] measure thesimilarity within the users’ profiles to find the extentto which users visit the same venues. Based on thesimilarities, k-NN set of a given user is computed.The nearest neighbor set is utilized to generate ratingfor a venue by using the following relationship

ra;j ¼ �ra þP

x2U simða;xÞðvxj��rbÞPx2U simða;xÞ

(9)

In (9), ra is the mean number of check-ins by a user a.. Singular Value Decomposition (SVD) matrix factoriza-

tion method [2], maps users and venues to a jointlatent factor space of dimensionality f . A user u isassociated to a row vector represented by pe 2 RJ , anda venue v is associated with a column vector given byqu 2 RJ . A user’s estimated rank for a venue v isrepresented as �ru;v ¼ qTv � pu. To estimate the values ofqv and pu, the regularized- squared-error is minimizedin the system given by E ¼ min

Pðu;vÞ2K ðru;v�

q�vPuÞ2 þ �ðjquj2 þ jpvj2Þ where ru;v is the rating of user

u for venue v and � controls the regularization extent,and is determined by cross validation.

4.2 Aggregation StrategiesWe selected the following aggregation strategies presentedin [11] for the group recommendations: (a) Least Misery,(b) Most Pleasure, (c) Average Satisfaction, and (d) ApprovalVoting. The least misery strategy selects the rating of a userwho has the minimum ratings for venue v within the group.The group rating GRv of venue v is calculated as GRv ¼minðru;vÞ The most pleasure strategy selects the rating of a userwho has the maximum ratings for venue v within the group.The group rating GRv of venue v under most pleasure isrepresented as GRv ¼ maxðru;vÞ The average satisfactionstrategy computes the group rating GRv of venue v bytaking the average of the ratings ðru;vÞ of the group members.The formula for group average satisfaction is given asGRv ¼ 1

n�Pn

u¼1ðru;vÞ The approval voting rates a venue basedon counting the group members who have ratings above acertain threshold. For example, counting the number ofgroup members that can reach a venue before the deadline

The aggregation strategies are applied with selectedschemes, such as SVD, POPULAR, and the OmniSuggestframework to aggregate the rankings of venues for thegroup (Figs. 6d, 6e, and 6f). The venue selected forrecommendation to a group in any of the above mentionedaggregation strategies is the maximum value ofGRv amongall the available venues, given as GR ¼ maxðGRvÞ.

4.3 ResultsIn this section, we present the evaluation results of theproposed cloud-based OmniSuggest framework. As theOmniSuggest framework takes into account the real-worldtime varying parameters, the traditional evaluation me-chanisms, such as Min-Cut and Map-Reduce cannot beapplied to measure the performance. Therefore, insteadof finding out the convergence times of the proposed


algorithms, we emphasize on the fact that the suitabilityof SLA is more critical than other intrinsic issues that areimportant in supercomputing, high performance comput-ing, and data intensive computing environments [18]. Inthe OmniSuggest framework, we have based the SLA onthe satisfaction of users with the recommended venues,instead of the response time as in traditional frameworks. Itis noteworthy to mention that because OmniSuggest mustrespond to real-time traffic information, its response timewould be superior to other systems, due to the paralleliza-tion in the processes. However, to quantify such a measureis not meaningful for the problem at hand.

We have carried out experiments on our internal Ubuntucloud setup running on 96 core Supermicro SuperServerSYS-7047GR-TRF systems. The data flow process betweenthe end users and cloud is depicted in Fig. 5. To performthe evaluations we used the Foursquare dataset [4] thatconsists of 425,680 tips provided by 49,027 users for 206,416venues in New York and 327,431 tips given by 38,134 usersin Los Angeles. The users’ check-in history is split into twoportions: 1) training set (80 percent of the records) and2) testing set (20 percent of the records). In the followingsubsections, we discuss the evaluation results for singleuser case and group recommendations.

4.3.1 Evaluation of Single User RecommendationThe Algorithm 1 reports the best performance for � ¼ 0:01 andtmax ¼ 100 that are determined empirically through numerousruns on varying datasets. The value tmax ¼ 100 is large enoughas data is already refined in the preprocessing phase. Toevaluate the single user recommendation effectiveness, we usethe following performance metrics [1]: 1) Precision, 2) Recall,and 3) F-measure. Precision is defined as the ratio of correct

recommendations (true positives ðtpÞ) to the total number ofrecommendations ðtpþ false positives ðfpÞÞ. The correct re-commendations count is computed as follows. For a givenuser, the ratings of randomly selected venues are set blank.Thereafter, the recommendation framework generates top-Nvenues for the user. The correct recommendations are thenumber of venues appearing as the intersection of theaforementioned top-N venues, and the venues that were setblank for evaluation. Precision gives the average quality of theindividual recommendations, and can be represented as:

Precision ¼ tp

tpþ fp : (10)

Recall is defined as a ratio of hit set size to the total sizeof test set, and is the measure of the recommendationcoverage by a recommendation system, given as:

Recall ¼ tp

tpþ fn : (11)

F-measure is the harmonic mean of precision and recall

F -measure ¼ 2� Precision�RecallPrecisionþRecall : (12)

As reflected in Figs. 6a and 6b, the OmniSuggest frameworkachieves the best performance in terms of precision andrecall, compared to the rest of the schemes (each of the plotshows the average of 100 random runs). This is because theOmniSuggest framework provides a more effective solutiontowards the data sparsity problem by augmenting similar-ity computations with conditional probabilities and bifur-cating the check-ins data into sub-categories. The reductionin data sparseness results in an increased recommendation

Fig. 6. Performance evaluation results: (a) Precision, (b) Recall, (c) F-measure, (d) Group consensus effects, (e) Group size effects, and (f) Effect ofrecurrent recommendations on global satisfaction.


precision. The well-known collaborative filtering tech-nique, such as SVD and UBCF [1], [2], indicated lowperformance in terms of precision and recall due to higherdata sparseness. Moreover, UBCF is not shown in plots as itfailed to produce any results on the highly sparse dataset ofFoursquare considered in our experiments. The popularity-based approaches, such as SOCIAL and POPULAR per-formed better than the collaborative filtering techniques.The reason is that popularity-based approaches do notutilize similarity computations in their models. Therefore,these approaches are not significantly affected bydata sparsity problems. The recall of OmniSuggest frame-work is the highest for N ¼ 20. This indicates that theframework provides a greater coverage in terms ofrecommendations. However, increase in coverage comesat the cost of lower precision values. The tradeoff betweenprecision and recall is depicted in Fig. 6c. Compared toother schemes, the cloud-based OmniSuggest frameworkindicates better performance in terms of the F-measure.The improved F-measure performance is due to the highervalues of precision and recall at N ¼ 10. The performanceof RANDOM remains low for all the aforementionedmetrics. This is because, RANDOM simply shuffles thecandidate set of unvisited locations for each user, withoutperforming similarity computations.

4.3.2 Evaluation of Group RecommendationThe group recommendation is evaluated by employingthe following aggregation strategies, as elaborated in [11]:(a) Average, (b) Least Misery, (c) Most Pleasure, and(d) Approval Voting. To imitate the real-world physicalfactors, we generated a random set of parameters for speed,distance, and road conditions. To ensure fairness in results,all of the recommendation models are evaluated with thesame set of random parameters. (Each of the plot shows theaverage of 100 runs.) The traditional performance metrics,such as precision and recall, cannot be utilized for groupbased recommendations. This is because, the groups arecreated on the fly and groups may have different number ofusers, which makes it impossible to store the group specifichistory in a database. Therefore, we utilized a performancemetric global satisfaction ðgsÞ [17] to evaluate the grouprecommendations:

gsðGÞ ¼ S � �s; (13)

where G represents the group, 0 � gsðGÞ � 1 is the globalsatisfaction for all of the group members, S is the mean,and �s represents the standard deviation of satisfaction.The global satisfaction, gs, provides a measure of similaritywithin the satisfaction level of all of the group members.An individual’s satisfaction level S in the group is definedby the following formula [17]:

Sðu;GÞ ¼

1:0; if eslðGR; listuÞ � 03;0:9; if eslðGR; listuÞ � 04;0:8; if eslðGR; listuÞ � 06;0:6; if eslðGR; listuÞ � 08;0:4; if eslðGR; listuÞ � 10;0:2; if eslðGR; listuÞ � 12;0:0; if eslðGR; listuÞ9 12:

8>>>>>>>><>>>>>>>>:

(14)

In above equation, the parameter GR represents therecommendation generated for the whole group, and listuis the recommendation list for each of the individual user.The Expected Search Length ðeslÞ function maps thesatisfaction level of an individual to the recommendationgenerated for the whole group. The function assigns a scalewithin range [0,1] to the index of a group recommendedvenue ðGRÞ in an individual member’s venues’ list ðlistuÞ.The greater value returned by esl function means a grouprecommended venue is appearing amongst the preferredvenues’ list of a given member.

Fig. 6d depicts global satisfaction results for a group offive members. The improved performance of proposedframework for all aggregation strategies is because of themeasures taken by the OmniSuggest framework to handlethe data sparseness. The SVD scheme being sensitive todata sparseness, does not exhibit uniformity in the personalopinions of group members. Therefore, the increasedstandard deviation value decreases the overall satisfactionscore for SVD in all of the aggregation strategies. Theperformance of OmniSuggest and POPULAR is almostsimilar for Average, Least Misery, and Most Pleasureaggregation strategies. This is due to the fact that theseschemes are less sensitive to data sparseness, and indicateuniformity in the satisfaction level of all group members.

Fig. 6e reflects the global satisfaction by varying thegroup size. For all of the recommendation approaches,the Average function is selected as an aggregation strategy.The increase in group size resulted into a decrease in globalsatisfaction for the three recommendation approaches.Moreover, increase in the number of group members alsoincreases the deviation in the satisfaction of individualmembers, which results in overall decrease of globalsatisfaction in (13).

The system generates new recommendations only whensignificant change in group members’ context occurs(such as road blocks). It can be observed from Fig. 6f thata recurrent recommendation has an insignificant effect onthe global satisfaction. The reason is that on all occasions,the recommended venue is the one that encapsulates theusers’ mutual consent based on their current context.However, it is noteworthy to mention that the change inthe recommendation may cause a negative effect on themood of the group members, as they have already travelledsome distance towards the previously recommendedvenue. Therefore, every new recommendation generatedby the system may decrease the overall satisfaction by afactor, given as below.

Sjðu;GÞ ¼ �j � Scðu;GÞ; (15)

where Scðu;GÞ represents a user’s satisfaction for a newvenue recommended by the system, and �j is scaling factorthat depicts the decay in satisfaction over a period of time.The parameter j indicates the number of times recurrentrecommendations are made. As depicted in Fig. 7, greaterthe value of j, lower the satisfaction level for the group inall the recommendation approaches.

To summarize the results, it is evident that our cloud-based OmniSuggest framework demonstrated an overallbetter performance, as the proposed framework has more


efficient mechanism of handling data sparsity problem.Moreover, in the case of the group recommendation, thecloud-based Omnisuggest framework provides a higherlevel of satisfaction in terms of recommended venues to thegroup members.

5 RELATED WORK

In this section, we discuss some of the recently proposed(2009-2013) techniques for venue recommendation sys-tems. The existing approaches can be categorized as [1],[2], [4], [19], [20]: 1) trajectory based, 2) explicit ratingbased, and 3) check-in based approaches. Trajectorybased approaches utilize information about a user’s visitsequence to various locations, the paths selected, and theduration of stays. Doytsher et al. [7] proposed atrajectory-based graphical model that keeps track offrequently traveled routes by users and recommendbest route to a new user. A similar approach to performpersonalized route recommendations is presented in [8].The authors in [21] mine GPS trajectories data to extractmost popular locations based on users’ travel sequences.Although the aforementioned approaches suggests loca-tions based on users’ past trajectories, they are unable todistinguish the places in terms of their categories, whichwe performed in our proposed OmniSuggest framework.Moreover, such approaches suffer from data sparsenessproblems, as usually a person does not frequently visit onmultiple places.

Many online social services, such as Yelp (yelp.com) andYellow pages (yellowpages.com) allow users to rate thevisited locations. Rating-based recommendation systemsutilize the existing ratings’ data to recommend people withmost popular venues or travel routes in a city. The authorsin [12] and [6] proposed models based on collaborativefiltering that take into account users’ existing ratings togenerate personalized venue recommendations. A similartechnique was presented in [3] that utilizes item basedcollaborative filtering method with ratings of venues in ausers’ vicinity. The aforementioned approaches mayclosely capture users’ preferences, but are are not scalableenough to simultaneously process huge volumes of real-time data. Moreover, they also suffer from data sparsenessissues due to limited number of entries within the user-rating matrix.

Apart from explicit rating, few existing techniqueshave based their models on implicit ratings. The implicitratings represent the number of check-ins performed byusers at different venues [5], [6]. For example, the authorsin [9] applied random-walk-with-restart on a user-venuecheck-in matrix to generate personalized recommenda-tions for a given user. Jie Bao et al. [4] proposed arecommendation scheme that generates region-wise ex-pert users and venues from check-in data under varioustypes.

Most of the above mentioned approaches have designsbuilt on (memory based) CF models, which enables theseapproaches to depict a user’s future preferences basedon his/her past entries. However, these approaches sufferfrom scalability issues due to large number of similaritycomputations on user-venue matrix during online recom-mendation process. Moreover, such approaches alsosuffer from data sparseness and cold start problems, asthere are very few users who have visited large numberof venues. Furthermore, these approaches do not providea solution to the group recommendation problem as wellas do not take into account the effect of real-word time-varying conditions on recommendations. To address theselimitations, our proposed cloud-based recommendationframework, OmniSuggest, presents a solution for scalabil-ity, data sparseness, and group recommendation chal-lenges. The proposed approach also takes into accountthe real-world conditions while generating recommenda-tions that results in a set of venues that satisfies all of thegroup members.

6 CONCLUSION

We presented a multifold contribution by devising cloud-based solutions for the venue recommendation problem insocial networks for a single user and/or a group of friends.The novelty and significance of this work was theintegration of knowledge engineering techniques, such asHITS method, Ant colony optimization, and collaborativefiltering on a cloud infrastructure to generate optimal set ofrecommendations. Different from the previous works, theproposed OmniSuggest framework not only took intoaccount the collective opinions of the experienced users,but also considers the effect of dynamic real-world physicalfactors, such as a person’s distance from venues, speed,weather conditions, and travel conditions. The scalabilityissues were addressed by proposing a cloud-based archi-tecture that allocated data and computational load ongeographically distributed cloud nodes. Data sparsenessissues were resolved by augmenting similarity computa-tions with conditional probabilities and further refiningthe data storage by bifurcating data into multiple levelsof predefined categories. In this way, the OmniSuggestframework always had a precompiled set of experiencedusers for any category and was able to recommend bestvenues for a new user at finer granularity. The evaluationresults with real-world Foursquare dataset indicated theimproved performance of the proposed OmniSuggest frame-work than many of the existing schemes. Our studyrevealed that real-world physical conditions have a

Fig. 7. Global satisfaction with the mood impact on recurrentrecommendation.


significant effect on the final recommendations, whencombined with users’ context.

In future, we plan to further combine approaches frommultiple disciplines, such as artificial neural networks,Bayesian networks, and machine learning techniques todevise solutions that efficiently handle the data sparseness,cold start, and scalability issues. Moreover, we intend tointegrate the recommendation module with early disasterwarning systems, such as information about tornados, land-slides, tsunamis, and floods, which would help in generatingrecommendations closely depicting real-world conditions.

REFERENCES

[1] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutierrez, ‘‘Recom-mender Systems Survey,’’ Knowl.-Based Syst., vol. 46, pp. 109-132,July 2013.

[2] L. Lu, M. Medo, C.H. Yeung, Y. Zhang, Z. Zhang, and T. Zhou,‘‘Recommender Systems,’’ Phys. Rep., vol. 519, no. 1, pp. 1-49,Oct. 2012.

[3] J. Levandoski, M. Sarwat, A. Eldawy, and M. Mokbel, ‘‘LARS:A Location-Aware Recommender System,’’ in Proc. IEEE 28thICDE, 2012, pp. 450-461.

[4] J. Bao, Y. Zheng, and M.F. Mokbel, ‘‘Location-Based andPreference-Aware Recommendation Using Sparse Geo-SocialNetworking Data,’’ in Proc. 20th Int’l Conf. Adv. Geogr. Inf. Syst.,2012, pp. 199-208.

[5] M. Ye, P. Yin, and W. Lee, ‘‘Location Recommendation forLocation-Based Social Networks,’’ in Proc. 18th SIGSPATIAL Int’lConf. Adv. Geogr. Inf. Syst., 2010, pp. 458-461.

[6] C. Chow, J. Bao, and M. Mokbel, ‘‘Towards Location-BasedSocial Networking Services,’’ in Proc. 2nd ACM SIGSPA-TIAL Int’l Workshop Location Based Social Netw., 2010,pp. 31-38.

[7] Y. Doytsher, B. Galon, and Y. Kanza, ‘‘Storing Routes in Socio-Spatial Networks and Supporting Social-Based Route Recom-mendation,’’ in Proc. 3rd ACM SIGSPATIAL Int’l WorkshopLocation-Based Social Netw., 2011, pp. 49-56.

[8] K. Chang, L. Wei, M. Yeh, and W. Peng, ‘‘DiscoveringPersonalized Routes From Trajectories,’’ in Proc. 3rd ACMSIGSPATIAL Int’l Workshop Location-Based Social Netw., 2011,pp. 33-40.

[9] A. Noulas, S. Scellato, N. Lathia, and C. Mascolo, ‘‘A RandomWalk Around the City: New Venue Recommendation inLocation-Based Social Networks,’’ in Proc. Int’l Conf. SocialCom,2012, pp. 144-153.

[10] D. Easley and J. Kleinberg, Networks, Crowds, and Markets:Reasoning About a Highly Connected World. Cambridge, U.K.:Cambridge Univ. Press, 2010.

[11] J. Masthoff, ‘‘Group Recommender Systems: Combining Indi-vidual Models,’’ in Recommender Systems Handbook. New York,NY, USA: Springer-Verlag, 2011, pp. 677-702.

[12] L. Wei, Y. Zheng, and W. Peng, ‘‘Constructing Popular RoutesFrom Uncertain Trajectories,’’ in Proc. 18th ACM SIGKDD Int’lConf. Mining, 2012, pp. 195-203.

[13] N. Tziritas, C.-Z. Xu, J. Hong, and S.U. Khan, ‘‘An Optimal FullyDistributed Algorithm to Minimize the Resource Consumptionof Cloud Applications,’’ in Proc. 18th IEEE ICPADS, Dec. 2012,pp. 61-68.

[14] D. Kliazovich, P. Bouvry, and S.U. Khan, ‘‘Simulation andPerformance Analysis of Data Intensive and WorkloadIntensive Cloud Computing Data Centers,’’ in Optical Inter-connects for Future Data Center Networks, C. Kachris, K. Bergman,and I. Tomkos, Eds. New York, NY, USA: Springer-Verlag,2012.

[15] J. Li, P. Roy, S.U. Khan, L. Wang, and Y. Bai, ‘‘Data Mining UsingClouds: An Experimental Implementation Of A Priori Over MapReduce,’’ in Proc. 12th Int’l Conf. ScalCom, Changzhou, China,Dec. 2012, pp. 1-8.

[16] P. Bedi and R. Sharma, ‘‘Trust Based Recommender SystemUsing Ant Colony for Trust Computation,’’ Exp. Syst. Appl.,vol. 39, no. 1, pp. 1183-1190, Jan. 2012.

[17] L. Quijano-Sanchez, J.A. Recio-Garcia, B. Diaz-Agudo,and G. Jimenez-Diaz, ‘‘Social Factors in Group RecommenderSystems,’’ ACM Trans. Intell. Syst. Technol., vol. 4, no. 1, p. 8,Jan. 2013.

[18] L. Wang and S.U. Khan, ‘‘Review of Performance Metricsfor Green Data Centers: A Taxonomy Study,’’ J. Supercomput.,vol. 63, no. 3, pp. 639-656, Mar. 2013.

[19] H.S. Chiang and T.C. Huang, ‘‘User-Adapted Travel PlanningSystem for Personalized Schedule Recommendation,’’ Inf. Fusion.[Online]. Available: http://dx.doi.org/10.1016/j.inffus.2013.05.011, June 11, 2013.

[20] J. Oh, O. Jeong, and E. Lee, ‘‘Collective Intelligence BasedPlace Recommendation System,’’ in Advanced InfocommTechnology . Berlin, Germany: Springer-Verlag, 2013,pp. 169-176.

[21] Y. Zheng, L. Zhang, X. Xie, and W.Y. Ma, ‘‘Mining InterestingLocations and Travel Sequences From GPS Trajectories,’’ in Proc.18th Int’l Conf. World Wide Web, 2009, pp. 791-800.

Osman Khalid received the MS degree incomputer engineering from Center for AdvancedStudies in Engineering (CASE), Pakistan. Cur-rently, he is pursuing the PhD degree at theNorth Dakota State University, Fargo, USA. Hisarea of research in PhD includes opportunisticnetworks, recommendation systems, and trustand reputation systems. He is a Student Memberof the IEEE.

Muhammad Usman Shahid Khan received themaster’s degree in information security fromNational University of Science and Technology,NUST, Pakistan. Currently, he is pursuing thePhD at the North Dakota State University, Fargo,USA. His areas of interest are recommendationsystems, data mining, cognitive radio networks,and network security. He is a Student Member ofthe IEEE.

Samee U. Khan received the BS degree fromGhulam Ishaq Khan Institute of EngineeringSciences and Technology, Topi, Pakistan, andthe PhD from the University of Texas, Arlington,TX, USA. Currently, he is Assistant Professorof Electrical and Computer Engineering at theNorth Dakota State University, Fargo, ND, USA.His research interests include optimization,robustness, and security of: cloud, grid, clusterand big data computing, social networks, wiredand wireless networks, power systems, smart

grids, and optical networks. His work has appeared in over 200publications. Dr. Khan is a Fellow of the Institution of Engineering andTechnology (IET, formerly IEE), and a Fellow of the British ComputerSociety (BCS). He is a Senior Member of the IEEE.

Albert Y. Zomaya is the Chair Professor of HighPerformance Computing & Networking in theSchool of Information Technologies, SydneyUniversity, Australia. Professor Zomaya is theauthor/co-author of seven books, more than 400papers, and the editor of 12 books and 15conference proceedings. He is the Editor in Chiefof the IEEE Transactions on Computers andservers as an associate editor for 19 leadingjournals. Professor Zomaya is the recipient of theIEEE TCPP Outstanding Service Award and the

IEEE TCSC Medal for Excellence in Scalable Computing, both in 2011.He is a Fellow of AAAS and IET (UK). He is a Fellow of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Documents

OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for Mobile Social Networks