Leveraging Social Networks for P2P Content-Based File Sharing in Disconnected MANETs

Embed Size (px)

DESCRIPTION

base paper

Citation preview

  • 1Leveraging Social Networks for P2PContent-based File Sharing in Disconnected

    MANETsKang Chen, Haiying Shen*, Member, IEEE , and Haibo Zhang

    AbstractCurrent peer-to-peer (P2P) le sharing methods in mobile ad hoc networks (MANETs) can be classied into three groups:ooding-based, advertisement-based and social contact-based. The rst two groups of methods can easily have high overhead andlow scalability. They are mainly developed for connected MANETs, in which end-to-end connectivity among nodes is ensured. Thethird group of methods adapt to the opportunistic nature of disconnected MANETs but fail to consider the social interests (i.e.,contents) of mobile nodes, which can be exploited to improve the le searching efciency. In this paper, we propose a P2P content-based le sharing system, namely SPOON, for disconnected MANETs. The system uses an interest extraction algorithm to derive anodes interests from its les for content-based le searching. For efcient le searching, SPOON groups common-interest nodes thatfrequently meet with each other as communities. It takes advantage of node mobility by designating stable nodes, which has the mostfrequent contact with community members, as community coordinators for intra-community searching, and highly-mobile nodes that visitother communities frequently as community ambassadors for inter-community searching. An interest-oriented le searching scheme isproposed for high le searching efciency. Additional strategies for le prefetching, querying-completion and loop-prevention, and nodechurn consideration are discussed to further enhance the le searching efciency. We rst tested our system on the GENI Orbit testbedwith a real trace and then conducted event-driven experiment with two real traces and NS2 simulation with simulated disconnected andconnected MANET scenario. The test results show that our system signicantly lowers transmission cost and improves le searchingsuccess rate compared to current methods.

    Index TermsMANETs, Content-based le sharing, Social networks

    1 INTRODUCTIONIn the past few years, personal mobile devices such aslaptops, PDAs and smart phones have been more andmore popular. Indeed, the number of smart-phone usersincreased by 118 million across the world in 2007 [1],and is expected to reach around 300 million by 2013 [2].The incredibly rapid growth of mobile users is leadingto a promising future, in which they can freely share lesbetween each other whenever and wherever. The num-ber of mobile searching users (through smart phones,feature phones, tablets, etc.) is estimated to reach 901.1million in 2013 [3]. Currently, mobile users interact witheach other and share les via an infrastructure formedby geographically distributed base stations. However,users may nd themselves in an area without wirelessservice (e.g., mountain areas and rural areas). Moreover,users may hope to reduce the cost on the expensiveinfrastructure network data.

    The P2P le sharing model makes large-scale networksa blessing instead of a curse, in which nodes share lesdirectly with each other without a centralized server.Wired P2P le sharing systems (e.g., BitTorrent [4] andKazaa [5]) have already become a popular and success-ful paradigm for le sharing among millions of users.The successful deployment of P2P le sharing systemsand the aforementioned impediments to le sharing in

    * Corresponding Author. Email: [email protected]. K. Chen and H. Shen are with the Department of Electrical and Computer

    Engineering, Clemson University, Clemson, South Carolina, 29634. Haibo Zhang is with the Cerner Cooperation.

    MANETs make the P2P le sharing over MANETs (P2PMANETs in short) a promising complement to currentinfrastructure model to realize pervasive le sharing formobile users. As the mobile digital devices are carried bypeople that usually belong to certain social relationships,in this paper, we focus on the P2P le sharing in adisconnected MANET community consisting of mobileusers with social network properties. In such a lesharing system, nodes meet and exchange requests andles in the format of text, short videos and voice clips indifferent interest categories. A typical scenario is a coursematerial (e.g., course slides, review sheets, assignments)sharing system in a school campus. Such a scenario en-sures for the most that nodes sharing the same interests(i.e., math) carry corresponding les (i.e., math les) andmeet regularly (i.e., attending math classes).

    In MANETs consisting of digital devices, nodes areconstantly moving, forming disconnected MANETs withopportunistic node encountering. Such transient net-work connections have posed a challenge for the de-velopment of P2P MANETs. Traditional methods sup-porting P2P MANETs are either ooding-based [6][9]or advertisement-based [10][12]. The former methodsrely on ooding for le searching. However, they lead tohigh overhead in broadcast. In the latter methods, nodesadvertise their available les, build content tables, andforward les according to these tables. But they havelow search efciency because of expired routes in thecontent tables caused by transient network connections.Also, advertising can lead to high overhead.

    Some researchers [13][17] further proposed to utilizecache/replication to enhance data dissemination/accessefciency in disconnected MANETs. However, nodes in

    Digital Object Indentifier 10.1109/TMC.2012.239 1536-1233/12/$26.00 2012 IEEE

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 2these methods passively wait for contents that they areinterested in rather than actively search les, which maylead to a high search delay.

    Recently, social networks are exploited to facili-tate content dissemination/publishing in disconnectedMANETs [18][21]. These methods exploit below prop-erty to improve the efciency of message forwarding.

    (P1) nodes (i.e., people) usually exhibit certainmovement patterns (e.g., local gathering, diversecentralities and skewed visiting preferences).

    However, these methods are only for the disseminationof information to subscribers. They are not specicallydesigned for le searching. Also, they fail to take intoaccount other properties of social networks revealed byrecent studies to facilitate content sharing:

    (P2) Users usually have a few le interests that theyvisit frequently [22] and a users le visit patternfollows a power-law distribution [23].

    (P3) Users with common interests tend to meet witheach other more often than with others [24].

    By leveraging these properties of social networks,we propose Social network based P2P cOntent-basedle sharing in disconnected mObile ad-hoc Networks(SPOON) with four components as shown in Figure 1.(1) Based on P2, we propose an interest extraction al-

    gorithm to derive a nodes interests from its les.The interest facilitates queries in content-based lesharing and other components of SPOON.

    (2) We refer to a collective of nodes that share commoninterests and meet frequently as a community. Ac-cording to P3, a node has high probability to ndinterested les in its community. If this fails, basedon P1, the node can rely on nodes that frequentlytravel to other communities for le searching. Thus,we propose the community construction algorithm tobuild communities to enable efcient le retrieval.

    (3) According to P1, we propose a node role assignmentalgorithm that takes advantage of node mobility forefcient le searching. The algorithm designates astable node that has the tightest connections withothers in its community as the community coordi-nator to guide intra-community searching. For eachknown foreign community, a node that frequentlytravels to it is designated as the community ambas-sador for inter-community searching.

    (4) We propose an interest-oriented le searching and re-trieval scheme that utilizes an interest-oriented routingalgorithm (IRA) and above three components. Basedon P3, IRA selects forwarding node by consideringthe probability of meeting keywords of interestsrather than nodes. The le searching scheme hastwo phases: intra- and inter-community searching.In the former, a node rst queries nearby nodes,then relies on coordinator to search the entire homecommunity. If it fails, the inter-community searchinguses an ambassador to send the query to a matchedforeign community. A discovered le is sent backthrough the search path and IRA if the path breaks.

    SPOON is novel in that it leverages social networkproperties of both node interest and movement pat-

    Interest Extraction

    Exploiting Node Stability/Mobility

    Community Construction

    Interest Oriented Routing

    Social network based P2P cOntent-based file sharing in mobile ad hOc Networks (SPOON)

    Fig. 1. Components of SPOON.

    tern. First, it classies common-interest and frequently-encountered nodes into social communities. Second, itconsiders the frequency at which a node meets differentinterests rather than different nodes in le searching.Third, it chooses stable nodes in a community as coor-dinators and highly mobile nodes that travel frequentlyto foreign communities as ambassadors. Such a structureensures that a query can be forwarded to the commu-nity of the queried le quickly. SPOON also incorpo-rates additional strategies for le prefetching, querying-completion and loop-prevention, and node churn con-sideration to further enhance le searching efciency.

    The rest of the paper is arranged as follows. Sec-tion 2 provides an overview of related works. Section 3presents the design of the components of SPOON. InSection 4, the performance of SPOON is evaluated incomparison with other systems. The last section presentsconcluding remarks and future work.

    2 RELATED WORK

    2.1 P2P File Sharing in MANETsWe rst introduce the P2P le sharing algorithms de-signed in MANETs.

    2.1.1 Flooding-based MethodsIn ooding-based methods, 7DS [6] is one of the rst ap-proaches to port P2P technology to mobile environments.It exploits the mobility of nodes within a geographic areato disseminate web content among neighbors. PassiveDistributed Indexing (PDI) [8] is a general-purpose dis-tributed le searching algorithm. It uses local broadcast-ing for content searching and sets up content indexes onnodes along the reply path to guide subsequent search-ing. Klemm et al. [7] proposed a special-purpose on-demand le searching and transferring algorithm basedon an application layer overlay network. The algorithmtransparently aggregates query results from other peersto eliminate redundant routing paths. Anna Hayes etal. [9] extended the Gnutella system to mobile environ-ments and proposed the use of a set of keywords torepresent user interests. However, these ooding-basedmethods produce high overhead due to broadcasting.

    2.1.2 Advertisement-based MethodsTchakarov and Vaidya [10] proposed GCLP for efcientcontent discovery in location-aware ad hoc networks. Itdisseminates contents and requests in crossed directionsto ensure their encountering. P2PSI [11] combines bothadvertisement (push) and discovery (pull) processes.It adopts the idea of swarm intelligence by regard-ing shared les as food sources and routing tables aspheromone. Each le holder regularly broadcasts an

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 3advertisement message to inform surrounding nodesabout its les. The discovery process locates the desiredle and also leaves pheromone to help subsequent searchrequests. Repantis and Kalogeraki [12] proposed a lesharing mechanism in which nodes use the Bloom l-ter to build content synopses of their data and adap-tively disseminate them to other nodes to guide queries.Though the advertisement-based methods reduce theoverhead of ooding-based methods, they still generatehigh overhead for advertising and cannot guarantee thesuccess of le searching due to node mobility.

    2.2 P2P File Sharing in Disconnected MANETsThe disconnected MANETs are featured by sparse nodedensity and intermittent node connection, which makespreviously introduced methods infeasible in such net-works. We then further introduce two categories of P2Ple sharing methods for disconnected MANETs.2.2.1 Cache/Replication-based MethodsHuang et al. [13] proposed a method that considersmultiple factors (e.g., node mobility, le popularity, andle server topology) in creating le replicas in le serversto realize optimal le availability in content distribu-tion community. Gao et al. [14] proposed cooperativecaching in disruption tolerant networks. It replicas eachle to network central locations, which are frequentlyvisited by nodes in the system, to ensure efcient dataaccess. QCR [15] uses le caching to realize effectivemultimedia content dissemination in opportunistic net-works. In addition to node mobility and le popularity,it also considers the impatience of users when creatingreplicas. Lenders et al. [16] investigated wireless ad hocpodcasting, in which nodes store contents from theirneighbors that are interested by themselves or the nodesthey have met. Chen et al. [17] deduced the optimal lereplication strategy in MANETs by further consideringnodes ability to meet nodes as a resource since replicason these nodes can meet more requesters and thus havehigher availability. Though these methods improve leavailability, nodes in these methods passively wait forcontents they are interested in rather than actively searchles, which may lead to search delay.2.2.2 Social Network-based MethodsRecently, social networks have been utilized in contentpublishing/dissemination algorithms [18][21] in oppor-tunistic networks. MOPS [18] provides content-basedsub/pub service by utilizing the long-term neighbor-ing relationship between nodes. It groups nodes withfrequent contacts and selects nodes that connect differ-ent groups as brokers, which are responsible for inter-community communication. Then contents and subscrip-tions are relayed through brokers to reach differentcommunities. MOPS only considers node mobility whileSPOON is more advantageous by considering both nodeinterest and mobility as described previously. Moreover,unlike MOPS that only depends on the meeting of bro-kers for inter-community search, SPOON enhances theefciency of inter-community search by (1) assigning oneambassador for each known foreign community, whichhelps to forward a query directly to the destination

    community, and (2) utilizing stable nodes (coordinator)to receive messages from ambassadors.

    The work in [25] is a similar to MOPS. It selectscentrality nodes as brokers and builds them into anoverlay, in which brokers use unicast or direct proto-cols (e.g., WiFi access points) for communication. Thennode publications are rst transferred to the brokernode responsible for the nodes community and thenpropagated to all brokers to nd matched subscribers.SocialCast [19] calculates a nodes utility value on aninterest based on the nodes mobility and co-locationwith the nodes subscribed to the interest. It publishescontents on an interest to subscribers by forwardingthe contents to nodes with the highest utilities on theinterest. ContentPlace [21] denes social relationshipbased communities and a set of content caching policies.Specically, each node calculates a utility value of pub-lished data it has met based on the datas destinationand its connected communities, and caches the datawith the top highest utilities. However, above methodsmainly focus on disseminating publications to matchedsubscribers. Therefore, these methods cannot be appliedto le searching directly.

    3 THE DESIGN OF SPOONIn this section, we st present trace data analysis to

    verify the social network properties in a real MANET. AP2P le sharing system usually consists of 1) a methodto represent contents, 2) a node management structureand 3) a le searching method based on 1) and 2).Accordingly, SPOON has three main components: 1)interest extraction, 2) structure construction includingcommunity structure and node role assignment, and 3)interest-oriented le searching and retrieval based on 1)and 2). We then present each component of SPOON.

    3.1 Trace Data AnalysisIn order to validate the correlation between node in-terests and their contact frequencies, we analyzed thetrace from the Haggle project [26], which contains theencountering records among 98 mobile devices carriedby scholars attending the Infocom06 conference. Someparticipants completed questionnaires, indicating theconference tracks that they are interested in.

    We use Tt to denote the time length of the trace, anddene the total meeting time of two nodes as the sumof the time length of each encountering. By regarding acommunity as a group of nodes in which each node hastotal meeting time larger than Tt/4 with at least half ofall nodes in the community, we detected 8 communitiesfrom the trace. We then calculated each nodes averagenumber of shared interested tracks with other membersin its own community Ci (0 i < 8), and with nodes inall other communities, respectively. Finally, the averagevalues of all nodes in each community are calculated andshown in Table 1.

    From the table, we see that for each community, nodeshave higher average number of shared interested trackswith same community nodes than with nodes fromother communities. Note that we used a relatively loosecommunity creation requirement that each node only

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 4needs to have a high contact frequency with half ofnodes in a community. With a stricter requirement and amore sophisticated clustering method, nodes in the samecommunity would share more interested tracks. Abovetraces verify the previously observed social propertiesand support the basis for SPOON that nodes with com-mon interests tend to meet frequently.

    TABLE 1Average number of shared interested tracks.

    Community Ci Ave. # of shared interests Ave. # of shared interestswith nodes in Ci with nodes not in Ci

    1 1.50 0.992 0.83 0.693 1.17 0.794 1 0.395 1.93 0.946 0.33 0.217 1.1 0.718 1 0.33

    3.2 Interest ExtractionWithout loss of generality, we assume that node contentscan be classied to different interest categories. It wasfound that users usually have a few le categories thatthey query for les frequently in a le sharing system.Specically, for the majority of users, 80% of their sharedles fall into only 20% of total le categories [22]. Likeother le sharing systems [27], [28], we consider thata nodes stored les can reect its le interests. Thus,SPOON derives the interests of a node from its les.Table 2 lists the notations used in this section.

    TABLE 2Notations in interest extraction

    Notation Meaningfi and Gu the i-th le and u-th interest group in a node

    witk and wutk the weight of keyword tk in fi and in Gufui the i-th le in Guvi the le vector of fivu the group vector of GuvN the node vector of node N

    To derive its interests, a node infers keywords fromeach of its les using the document clustering technique[29]. Specically, a node derives a le vector for eachof its les from its metadata. For le fi, we denote itsle vector by vi = (t1, wit1 ; t2, wit2 ; t3, wit3 ; ...; tm, witm),in which tk and wik (1 k m) denote a keywordand its weight that represents the importance of thekeyword in describing the le. We adopt the method inthe text retrieval literature [30] to calculate the weight ofa keyword, say tk, in a le, say fi, with below formula.

    witk = 1 + log(ntk), (1)where ntk refers to the number of occurrences of key-word tk. Suppose there are m keywords in the le, wefurther normalize the weights by:

    witk = witk/

    mq=1witq . (2)

    Then, in order to calculate the similarity of twole vectors, say v1=(t1, w1t1 ; t2, w1t2 , t4, w1t4) andv2=(t1, w2t1 ; t3, w2t3 ; t5, w2t5), we rst generate theircommon vector, which consists of their commonkeywords and corresponding weights in their ownvectors. For example, the common vector of v1 and v2 is

    (t1, w1t1) from v1 and (t1, w2t1) from v2. We then use thefollowing formula to calculate the similarity between v1and v2:

    sim(v1, v2) =

    mk=1 w1k w2km

    k=1 w21k

    m

    k=1 w22k

    , (3)

    where m is the total number of common keywords andw1k and w2k represent the weights of the k

    th commonkeyword of the two vectors, respectively.

    After retrieving the le vector of each of its les, anode classies its les to derive its interest groups. It cre-ates a le similarity matrix A = sim(vr, vs) (1r & sm),where m is the number of les the node has. Since thesimilarities among les are known, we use the AGNESmethod [31] to cluster the les into interest groups in ahierarchical manner. Each le form an individual groupinitially. Then, two most similar le groups are mergedin each step. This process repeats until the similaritybetween any two groups is below a threshold. Thesimilarity between two groups is calculated based ontheir interest vectors introduced below. Consequently, ale is classied to only one interest group and there isno overlap among groups.

    Each group has a number of les. Suppose there are gles in interest group Gu, denoted by (fu1, fu2, . . . , fug).The average weight of a keyword, say tk, in the groupis calculated by wutk =

    gi=1 w

    fuitk

    /g, where wfuitk denotesthe weight of tk in fui. We also pre-dene a threshold forthe average weight, denoted by Tw. We form an interestvector with keywords having weights larger than Tw anduse it to represent interest group Gu :

    vu = (t1, wut1 ; t2, wut2 ; t3, wut3 ; ...; tn , wtun ). (4)

    where n is the total number of keywords in vu. Thus,each node has a number of interest vectors to representits interests. The weight of Gu, denoted W (Gu), is theportion of the groups les in all les of the node. Wethen generate a node vector (vN ) to describe a nodesinterests. The keywords of vN is the keyword union of allinterest group vectors, and the weight of each keywordis the sum of its weights in different interest groups itbelongs to normalized by the weights of these groups.

    3.3 Community ConstructionSocial network theory reveals that people with the sameinterest tend to meet frequently [24]. By exploiting thisproperty, SPOON classies nodes with common interestsand frequent contacts into a community to facilitatesinterest based le searching, as introduced latter in Sec-tion 3.5. Nodes with multiple interests belong to multiplecommunities. The community construction can easily beconducted in a centralized manner by collecting node in-terests and contact frequencies from all nodes to a centralnode. However, considering that the proposed system isfor distributed disconnected MANETs, in which timelyinformation collection and distribution is non-trivial, wefurther propose a decentralized method to ensure theadaptivity of SPOON in real environment.

    When two nodes, say N1 and N2, meet, they considertwo cases for community creation: (1) they do not belong

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 5to any communities, and (2) at least one of them isalready a member of a community. In the rst case, theycalculate the similarity between each pair of their interestvectors using Formula (3). A pair of interest groups,say Gi and Gj with interest vectors vi and vj , is calledmatched interest group when W (Gi)W (Gj)sim(vi, vj) TG, where TG is a predened threshold. The purpose oftaking into account the weight of each interest group isto eliminate the noise of interest groups with a smallnumber of les and achieve better interest clustering. IfN1 and N2 have at least one pair of matched interestgroup, and their contact frequency, F (N1, N2), is higherthan the top h1% highest encountering frequencies ineither node, the two nodes form a new community.The keywords in their matched interest groups andcorresponding weights constitute the community vector(vC) of the community.

    In the second case, suppose N2 is already a memberof community C, N1 calculates W (Gi)sim(vi, vC) foreach of its interest groups, say Gi, to decide whether itshould join in community C. If the similarity value forone interest group is larger than TG, and N1s contactfrequency with community C is higher than the toph2% of N1s contact frequencies with all nodes it hasmet, N1 is granted the membership to community C.The contact frequency with community C refers to theaccumulated contact frequency with nodes in C. It isupdated upon each encountering with a node in C. Thismeans that Ni contacts members in community C fre-quently enough to guarantee connections.N1 then copiesthe community vector and other community informationfrom N2. Also, when a node meets the community co-ordinator, it reports its les to the coordinator to updateits le index and community vector. The coordinatorthen forwards the updated community information tocommunity members when meets them.

    With above community construction method, nodeswith common interests and frequent contacts graduallyform a community. However, nodes that appear laterhave more stringent community acceptance requirement.Its contact frequency to the community needs to behigher than that of more nodes, and its interest vector iscompared with a longer community vector. Also, nodesin a group admit new members distributively. As aresult, nodes in a group may not have very similarinterests or high contact frequencies. We propose twosolutions to alleviate this problem. First, we set an initialperiod for newly joined nodes in which they accumulatecontact frequencies with others. Then, when a node startsto join in communities, its meeting frequencies with oth-ers are relatively stable, which provides more accuratemeasurement for determining the communities to joinin. Second, we use group member pruning. Existingcommunity members can have a second round votingto conrm the eligibility of new community members.Specically, if N2 in community C nds a node, say N1,satises the requirements of C, it awards N1 a potentialmembership for C. Then, other community members inC further checks N1s eligibility to join in C. That is,every time when N1 meets an existing member of C,

    say N3, N3 checks whether they have at least one pairof matched interest group and whether their contactfrequency is higher than the top h1% of N1s highestcontact frequencies. If yes, N3 approves the membershipofN1. When an existing community member of C noticesthat N1 receives the grants from half of the communitymembers, it grants N1 the group membership.

    Another issue is that node contact frequencies andinterests may change over time. Since the communityconstruction algorithm is continuous running, a nodecan detect that it fails to satisfy the requirement ofcurrent community. It then withdraws from the currentcommunity, noties connected nodes in it, and search fora new community to join in.

    The values of the thresholds used in the commu-nity construction process (TG, h1, h2) are determinedby many factors such as number of nodes, number ofinterests and types of applications. Generally, TG decidesthe interest tightness among nodes in each community.A larger TG leads to higher similarity between interestsof nodes in one community, but also generates morecommunities. Therefore, TG should be congured basedon application scenario. If the application has clear lecategories (i.e., course le sharing), we can set a large TGto gather les in the same category. Otherwise, a mediumTG should be set to balance the interest closeness andthe number of communities. h1 and h2 determines thetightness of a community. The smaller h1 and h2 are, thetighter the community is. Therefore, we set them to 30 bydefault in experiment to ensure frequent contact amongcommunity members. These values were set based onempirical experiences. We leave further investigation onappropriate values as future work.

    3.4 Node Role Assignment

    A previous study has shown that in a social networkconsisting of mobile users, only a part of nodes havehigh degrees [20]. We can often nd an important orpopular person who coordinates members in a com-munity in our daily life. For example, the college deancoordinates different departments in the college, and thedepartment head connects to faculty members in thedepartment. Thus, we take advantage of different typesof node mobility for le sharing.

    We dene community coordinator and ambassadornodes in the view of a social network. A communitycoordinator is an important and popular node in thecommunity. It keeps indexes of all les in its community.Each community has one ambassador for each knownforeign community, which serves as the bridge to thecommunity. The coordinator in a community maintainsthe vC of foreign communities and corresponding am-bassadors in order to map queries to ambassadors forinter-community searching. The number of ambassadorsand coordinators can be adjusted based on the networksize and workload in order to avoid overloading thesenodes. Since ambassadors and coordinators take moreresponsibility, we can also adopt role rotation and extraincentives for fairness consideration. Due to page limit,we leave this as our future work.

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 63.4.1 Community Coordinator Node SelectionWe dene a stable node that has tight contact frequencywith other community members as the community co-ordinator. In network analysis, centrality is often usedto determine the relative importance of a vertex withinthe network. We then adopt the improved degree cen-trality [32], which assigns weight to each link based onthe contact frequency, for coordinator selection since itreects the tightness of a node with other communitymembers. In the initial phase of coordinator discovery,each node, say node Ni, in a community collects contactinformation from its neighbors in the same communityand then calculates its degree centrality by

    D(pi) =N

    j=1,j =iwij , (5)

    where wij is the link weight between Ni and Nj andN is the number of neighbors in the same community.In order to reect the property that the coordinator hasthe most connections with all community members, wijequals 1 if the contact frequency between Ni and Nj islarger than a threshold and 0 otherwise. Though sucha method cannot ensure its connection to every com-munity member, it ensures that the coordinator has thetightest overall connection to all community members.

    Each node periodically checks its degree centrality andbroadcasts such information to all community members.If a node receives no larger centrality score than its owncentrality for three consecutive periods, it claims itselfas the potential coordinator. The potential coordinatorwould conrm its status as the coordinator when meetsthe previous one. If it is conrmed, it then requests thecommunity information from the old coordinator. Also,when the new coordinator meets community members,they exchange information for group vector update andambassador selection, as well as request routing.

    3.4.2 Community Ambassador Node SelectionAn ambassador is used to bridge the coordinator in itshome community and a foreign community. We use theproduct of a nodes contact frequency with its coordina-tor and that with the foreign community for ambassadorselection. Each node i calculates its utility value forforeign community k by

    Uik = F (Ni, Ck) F (Ni, Nc) (6)where Ck represents foreign community k, Nc is thecoordinator in its home community, and F () denotesthe meeting frequency. Each node reports its utilityvalues for foreign communities it has met to the coor-dinator in its home community. Then, the communitycoordinator chooses one ambassador for each knownforeign community. Also, the node that has the highestoverall contact frequency with all foreign communitiesis selected as the default ambassador. In case that arequest fails to nd a matched ambassador, the defaultambassador can carry the request and seek for potentialforwarders in foreign community. If an ambassador losesthe connection with the coordinator for a certain period

    DR

    Community C1

    Data HolderRequester

    Community C2

    A1A1

    A2

    A2

    C1

    C2

    Fig. 2. File searching in SPOON.

    of time, a new ambassador that satises above require-ments is selected. This arrangement facilitates interest-oriented le searching by enabling a coordinator to sendle requests to matched foreign communities quickly.

    In above design, ambassadors are the key to connectdifferent communities efciently. Coordinators achievesbalance between the centralized and distributed search-ing by checking whether a community can satisfya query quickly, which is important in disconnectedMANETs. Also, though broadcast is used in coordinatorselection, the cost is limited because 1) it is only amongcommunity members, and 2) we can set a long inter-broadcast period since nodes usually have stable degreecentrality. To select ambassadors, each node just reportsits utility values to the coordinator, which can be pig-gybacked on the beacon messages. Therefore, this stepdoes not incur signicant extra costs.

    3.5 Interest-oriented File Searching and RetrievalIn social networks, people usually have a few le in-terests [22] and their le visit pattern generally followsa certain distribution [23]. Also, people with the sameinterest tend to contact each other frequently [24]. Thus,interests can be a good guidance for le searching. Con-sidering the relation among node movement pattern [33],individuals common interests, and their contact fre-quencies, we can route le requests to le holders basedon nodes frequencies of meeting different interests.

    Then, the interest-oriented le searching schemehas two steps: intra-community and inter-communitysearching. A node rst searches les in its home commu-nity. If the coordinator nds that the home communitycannot satisfy a request, it launches the inter-communitysearching and forwards the request to an ambassadorthat will travel to the foreign community that matchesthe requests interest. A request is deleted when its TTL(Time To Live) expires. During the search, a node sendsa message to another node using the interest-orientedrouting algorithm (IRA), in which a message is alwaysforwarded to the node that is likely to hold or to meetthe queried keywords. The retrieved le is routed alongthe search path or through IRA if the route expires.

    3.5.1 Interest-oriented Routing AlgorithmIn SPOON, every node maintains a history vector

    that records its frequency of encountering interest key-words. The history vector is in the form of vH =(t0, wh0; t1, wh1; t2, wh2; ...; tn, whn), where whi is the ag-gregated times of encountering keyword ti. whi decays

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 7periodically as time passes by whi = whi( < 1). Whentwo nodes meet, they exchange their node vectors andupdate history vectors. The history vector is used toevaluate the probability of meeting the queried content.

    The destination of a request is represented by a vectorvdest = (t0, w0; t1, w1; t2, w2; ...; tn, wn). In IRA, a nodeuses the tness score F to evaluate its neighbors prob-abilities to be or to meet the le holder. The tness Fof neighbor i is measured by F = sim(vdest, vi) + (1)sim(vdest, vHi), where vi and vHi are the node vectorand history vector of node i, respectively. The factor ofsim(vdest, vi) aims to nd the node sharing the mostsimilar interests with the destination, and the factor ofsim(vdest, vHi) aims to nd a node that is very likelyto meet the destination in its movement. is used tocontrol the weight of these two factors. In IRA, when anode receives a message, if its neighbor with the highestF has higher F than itself, it forwards the message to theneighbor. This process repeats until the message arrivesat the destination. Coordinators do not use IRA butsend messages to its community members when meetingthem because they usually have tight connections withall community members.

    3.5.2 Intra-Community File Searching and Retrieval

    The query message is represented by a query vector rep-resented as: vQ = (t0, w0; t1, w1; t2, w2; ...; tn, wn). Eachquery is associated with a counter (count) indicating thenumber of hops it can travel. The count is decrementedby 1 after each forwarding. Since the query is initiatedby users, term weights in vQ are constant values. Inthe intra-community searching, the destination that aquery is sent to is represented by a combination of thevQ and the node vector of the requesters communitycoordinator (vNC ), represented by:

    vdest = vQ + (1 )vNC , (7)In the rst step, the requester calculates the similaritybetween the query vector and the community vector of thecommunity it belongs to. If Sim(vQ, vC) < Ts, the queryis sent to the coordinator of the community directly (i.e., equals 0). Otherwise, equals 1 when the counter(count) is larger than 0 and 0 otherwise. This meansthat a requester rst searches nearby nodes within counthops, and then resorts to its community coordinator.Specically, the requester sends out a query to top Fneighbors with the highest . Having > 1 copies ofa request can enhance the efciency of le searching.We call this strategy multi-copy forwarding. In order tolimit the number of copies for each request, we set = min{k|ki=0 Fi > }, where Fi is the tness ofneighbor i and is the minimum delivery guaranteefactor. The hop counter of a query is decreased by oneafter each forwarding. If the le is not found whencount = 0, it is forwarded to the community coordinator( = 0). When a node receives multiple copies of query,it only processes the rst one.

    When node Nj receives a request, if vdest = vQ andsim(vdest, vNj ) reaches the similarity threshold speciedby the requester, it rst tries to send the satised les

    back to the requester along the original path. If a for-warder on the path is not available due to node mobility,IRA is used to forward the le. Otherwise, Nj uses IRAto further forward the query. If vdest = vNc and Nj is notthe coordinator Nc, Nj uses IRA to forward the requestto Nc. After Nc receives the query, it checks its leindexes. If the indexes have les satisfying the request,the coordinator sends the request to the le holderwhen meeting it, which then sends the le back to therequester. Otherwise, Nc initiates the inter-communityle searching. Algorithm 1 shows the pseudocode of theintra-community searching algorithm.

    Algorithm 1:Pseudo-code of intra-community le searching forquery Q conducted by node Ni.Procedure intraSearchForQ ()

    if a neighbor nb of Ni matches query Q thenNi.sendQeuryTo(Q, nb)

    else if Q.src = Ni thenif Sim(vQ, vC) < Ts then

    Q.vdest vNCNi.sendThroughIRATo(Q, NC )

    elseQ.vdest vQNi.rankNbByFitness()overallF 0for each neighbor nb of node Ni do

    overallF gets overallF + F (Q,nb)Ni.sendQeuryTo(Q, nb)if overallF > then

    breakelse

    if Q.hops < MaxHopthenQ.vdest vQNi.rankNbByFitNess()nb the neighbor with maximal tnessNi.sendQeuryTo(Q, nb)

    elseQ.vdest vNCNi.sendThroughIRATo(Q, NC )

    3.5.3 Inter-Community File Searching and RetrievalIn the inter-community searching algorithm, a coordina-tor maps a request to the foreign community that is mostlikely to contain the queried le. Similar to the intra-community search step, the coordinator also uses themulti-copy forwarding strategy, i.e., it sends out a queryto ambassadors having the highest similarity with thequery in order to enhance the efciency of the forward-ing. We limit the number of copies for each request byletting = min{k|ki=0 sim(vQ, vC) > }, where isthe minimum delivery guarantee factor. Ambassadorsthen forward the request to the foreign community.

    Upon receiving the request, the coordinator in theforeign community checks its le index to see if itscommunity has the le. If not, the coordinator repeatsthe inter-community le searching by looking up its am-bassadors to check for further forwarding opportunities.If the le exists, the coordinator asks for the le from thele holder when meeting it and sends the le back to therequesters community through the corresponding am-bassador. The coordinator of the requesters communitywill further forward the le to the requester.

    Figure 2 depicts the process of le searching, in whicha requester (node R) in community C1 generates a le

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8request. Since its neighbors within count hops dont havethe le, the request is then forwarded to the communitycoordinator NC1. NC1 checks the community le indexesbut still cant nd the le. It then asks the communityambassador NA1 to forward the request to the foreigncommunity matching the queried le. Using the sameway as NC1, the community coordinator NC2 nds thele and sends it back to the requesters communityvia ambassador NA2. The le is rst sent to NC1, andthen forwarded to the requester. Algorithm 2 shows thepseudocode of the inter-community searching algorithm.

    Algorithm 2:Pseudocode of inter-community le searching forquery Q conducted by node Ni.Procedure interSearchForQ ()

    if Ni is a coordinator thenbContain Ni.checkContainFile(Q)if bContain

    Ni.sendQeuryToDes(Q)else

    Ni.rankAmByMatch()overallS 0for each ambassador NA of Nis community do

    n.sendQeuryTo(q, NA)overallS overallS + Sim(q.VQ, NA.VC)if overallS >

    breakif Ni is an ambassador then

    when Ni meets another node Njif Nj .homeCommunity = Ni.foreighCommunity then

    Ni.sendQeuryTo(Q, Nj )Nj .sendThroughIRATo(Q, NC )

    3.6 Information Exchange among NodesWe summarize the information exchanged among nodesin SPOON. In the community construction phase, twoencountered nodes exchange their interest vectors andcommunity vectors, if any, for community construction.In the role assignment phase, nodes broadcast their de-gree centrality within their communities for coordinatorselection. When the coordinator is selected, the coordina-tor ID is also broadcasted to all nodes in the community.Then, each node reports its contact frequencies withforeign communities to the coordinator for ambassadorselection. Besides, when a node meets a coordinator of itscommunity, the node also sends its updated node vectorto the coordinator to update the community vector andretrieves the updated community vector from the coordi-nator. When an ambassador meets the coordinator of itscommunity, it reports the community vectors of foreigncommunities to the coordinator. After above informationexchange, two encountered nodes exchange their nodevectors and history vectors for packet routing. Eachnode checks packets in it sequentially to decide whichpackets should be forwarded to the other node basedon the le searching algorithm introduced in Section 3.5.Further, when network turns to be stable, the frequencyof information exchange for community constructionand node role assignment can be reduced to save costs.

    3.7 Intelligent File PrefetchingAmbassadors in SPOON can meet nodes holding dif-ferent les since they usually travel between different

    communities frequently. Taking advantage of this fea-ture, an ambassador can intelligently prefetch popularles outside of its home community. Recall that a queryin a local community for a le residing in a remotecommunity are forwarded through the coordinator ofthe local community. Thus, each coordinator keeps trackof the frequency of local queries for remote les andprovides the information of popular remote les to eachambassador in its community upon encountering it.When a community ambassador nds that its foreigncommunity neighbors have popular remote les that arefrequently requested by its home community members,it stores the les on its memory. The prefetched lescan directly serve potential requests in the ambassadorshome community, thus reducing the le searching delay.

    3.8 Querying-Completion and Loop-PreventionGiven a le query, there may exist a number of matchingles in the system. A node can associate a parameterSmax with its query to specify the number of les thatit wishes to nd. A challenge we need to handle isto ensure that the querying process stops when Smaxmatching les are discovered when multi-copy forward-ing is used. To solve this problem, we let a query carrySmax when it is generated. When a query nds a lethat matches the query and is not discovered before, itdecreases its S by 1. Also, if this query is replicated toanother node, S is evenly split to the two nodes. A querystops searching les when its S equals 0.

    When a query needs to nd more than one le, itis likely that IRA would forward a query to the samenode repeatedly. To avoid this phenomenon, SPOONincorporates two strategies. First, the query holder in-serts its ID to the query before forwarding the queryto the next node. Second, a node records the queries ithas received within a certain period of time. The formermethod avoids sending a packet to nodes it has visitedbefore while the latter method prevents sending differentreplicas of the same query to the same node. Specically,when a node, say Ni, needs to forward a query to anewly met node Nj based on IRA, Ni checks whetherthe querys record of traversed nodes contains Nj . Ifyes, Ni does not forward the query to Nj . Also, when anode receives a query, if the query exists in its record ofreceived queries, the node sends the query back to thesender. These two strategies effectively avoid searchingloops by simply preventing a node from forwarding thesame query to nodes that have received the query before.

    3.9 Node Churn ConsiderationIn SPOON, when a node joins in the system, it rst

    nds the communities it belongs to and learns the IDsof community coordinators, and then reports its lesand utility values to the community coordinator whenencountering it. This enables the coordinator to maintainupdated information of the community members.

    A node may leave the system voluntarily when usersmanually stop the SPOON application on their devices.In this case, a leaving node informs its community coor-dinator about its departure through IRA. If the leaving

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 9node is an ambassador, the coordinator then chooses anew ambassador. If the leaving node is a coordinator, ituses broadcast to notify other community members toselect a new coordinator.

    A node may also leave the system abruptly due tovarious reasons. Simply relying on the periodical beaconmessage, a node cannot tell whether a neighbor is leftor is just isolated from itself, which is a usual case inMANETs. To handle this problem, each node records thetimestamps when it meets other nodes, and sends it tothe coordinator through IRA. The coordinator receivesthis information and updates the most recent timestampof each node seen by other nodes. If the coordinatornds that a nodes timestamp is more than Tx secondsago, it considers this node as a departed node. Similarly,normal nodes in a community also maintain and updatethe timestamp of the coordinator to determine whetherit is still alive. A node piggybacks the coordinator de-parture information on the beacon messages. Then, itsnearby nodes can know whether the coordinator has left.Note that a node can know the number of communitymembers from the coordinator. When a node nds thatmore than half of community members have found thatthe coordinator has left, it broadcasts a coordinator re-election message to select a new coordinator using thesame method explained in Section 3.4.1.

    4 PERFORMANCE EVALUATIONWe evaluated the performance of SPOON in comparisonwith MOPS [18], PDI+DIS [8], [12], CacheDTN [14],PodNet [16], and Epidemic [34]. MOPS is a social net-work based content service system. It forms nodes withfrequent contacts into a community and selects nodeswith frequent contacts with other communities as bro-kers for inter-community communication. PDI+DIS is acombination of PDI [8] and an advertisement-based DIS-semination method (DIS) [12]. PDI provides distributedsearch service through local broadcasting (3 hops), andbuilds content tables in nodes along the response path,while DIS let each node disseminate its contents to itsneighbors to create content tables. CacheDTN replicateles to network centers in decreasing order of theiroverall popularity. In PodNet, nodes cache les interetedby them and nodes they have met. We adopted theMost Solicited le solicitation strategy in PodNet. Wedoubled the memory on each node in CacheDTN andPodNet for replicas. In Epidemic, when two nodes meeteach other, they exchange the messages the other has notseen. We have conducted the following experiments.(1) Evaluation of Community Construction. We rst eval-uated the proposed community construction algorithmintroduced in Section 3.3.(2)GENI experiments. We deployed the systems on thereal-world GENI ORBIT testbed [35], [36] and testedthe performance using the MIT Reality trace. The GENIORBIT testbed contains 400 nodes with 802.11 wirelesscards. Nodes can communicate with each other throughthe wireless interface. We used real trace to simulatenode mobility in ORBIT: two nodes can communicatewith each other only during the period of time whenthey meet in the real trace.

    (3) Event-driven experiments with real trace. We also con-ducted event-driven experiments with two real traces.(4) Evaluation of enhancement strategies. We evaluated theeffect of the enhancement strategies introduced in Sec-tion 3.7, 3.8, and 3.9 through event-driven experiments.(5)NS2 experiments with synthetic mobility. We conductedexperiments on NS-2 [37] using a community based mo-bility model [38] to evaluate the applicability of SPOONin different types of networks. Due to page limit, theresults are shown in Appendix A.

    We disabled the intelligent prefetching and the multi-copy forwarding (i.e., = 1 and = 1) in SPOON tomake the method comparable. Also, since the compari-son methods can return only one le for a query, we setSmax = 1 in SPOON. In each community, we used thenode having the most contacts with other communitiesas the ambassador in SPOON and as the broker inMOPS. We also set the node with the most contacts withits community members as the coordinator in SPOON.

    Besides the Haggle trace, we further tested with theMIT Reality trace [39], in which 94 smart phones weredeployed among students and staffs at MIT to recordtheir encountering. The two traces last 0.34 million sec-onds (Ms) and 2.56 Ms, respectively. As in MOPS, weused 40% of the two traces to detect groups in whichnodes share frequent contacts. Here, we use group torepresent a group of nodes with frequent contacts, anduse community to represent a group of nodes withcommon-interests and frequent contacts. We got 7 and 8groups for the MIT Reality trace and the Haggle trace,respectively. Then, since there is no real trace for P2Pover MANETs, we collected articles from different newscategories (e.g., sports, entertainment and technology)from CNN.com and mapped them to the identiedcommunities. Each node contains 50 articles from thenews category for its community. Each node extracts itsinterests from its stored les. The similarity thresholdwas set to 70% in AGNES for le classication.

    In experiments with the Haggle trace and the MITReality trace, we set the initialization period to 0.09Msand 0.3Ms, the query generation period to 0.1Ms and1Ms, and the TTL of a query to 0.15Ms and 1.2Ms,respectively. Considering that people usually generatequeries according to their interests, we set 70% of totalqueries searching for les located in the local community.Each query is for an article randomly selected from thearticle pool. We measured following metrics:(1)Hit rate: the percentage of requests that are success-fully delivered to the le holders.(2)Average delay: the average delay of the successfullydelivered requests.(3)Maintenance cost: the total number of all messagesexcept the requests, which are for routing informationestablishment and update, or replication creation.(4)Total cost: the total number of messages, includingmaintenance messages and requests, generated in a test.

    4.1 Evaluation of Community ConstructionWe rst tested the effectiveness of the community con-struction method in SPOON, denoted by SPOON-CC,

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 10

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    20 30 40 50

    Sim

    ilarit

    y

    h1

    MITReality Haggle

    (a) h2 = 30 and h1 = 20 50.

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    20 30 40 50

    Sim

    ilarit

    y

    h2

    MITReality Haggle

    (b) h1 = 30 and h2 = 20 50.Fig. 3. Average similarity values with different h1 and h2.

    in comparison with Active-CC and Centralized-CC. TheActive-CC selects three most active nodes to collectnode contacts and interests when they meet nodes. InCentralized-CC, we purposely let a super node collectall node contacts and interests timely. Both Active-CCand Centralized-CC use AGNES to build communitieswith the collected information.

    Since there is no real trace about content sharing inP2P MANETs, we tested in an indirect way. We rst con-ducted the group construction and content distributionas previously described, and then removed the groupidentity of each node. Then, we run the three methodsto create communities. After this, we matched each newcommunity to the most similar old community andcalculated the similarity value by N2sm/(Np Nn), whereNsm is the number of common nodes, Np and Nn denotethe size of the old community and the new community,respectively. In SPOON-CC, we set TG to 1 to ensureinterest closeness, and set h1 and h2 to 30. Active-CC andCentralized-CC used the same threshold for grantingcommunity membership as SPOON-CC. The averagesimilarities in SPOON-CC, Active-CC, and Centralized-CC are 0.95, 0.87, and 1 with the Haggle trace and0.91, 0.85, and 1 with the MIT trace, respectively. TheCentralized-CC has inferior performance since activenodes can only collect information from nodes they havemet, leading to less accurate community construction.Also, the performance of SPOON-CC is close to thatof Centralized-CC, which has the best performance intheory. Such a result shows the effectiveness of SPOON-CC in this test.

    We further varied h1 and h2 from 20 to 50 to verify theeffectiveness of SPOON-CC. Figure 3(a) and 3(b) showthe average similarity values with the two traces. We seethat the similarity values are above 90% with various h1and h2. Such results further conrm the effectiveness ofSPOONs community construction algorithm to clusternodes with frequent contacts and similar interests in theexperiments with the two real traces.

    4.2 GENI ExperimentsTable 3 shows the results of the GENI experiments ofthe six methods. From the table, we nd that Epidemicgenerates the highest hit rate with the highest totalcost and a low average delay. This is resulted from thedissemination nature of broadcasting. SPOON producesthe second highest hit rate at the second lowest totalcost and relatively high average delay. This is becauseSPOON utilizes both contact and content properties ofsocial networks to guide le querying. Therefore, itcan successfully locate queried les without the need

    of many information exchanges and request messages,though at a relatively slow speed. SPOON outperformsMOPS in terms of hit rate, delay and cost. This is becauseSPOON utilizes IRA for intra-communication and dedi-cated ambassadors for inter-communication while MOPSrelies heavily on brokers. Also, MOPS only considersnode contact in routing, while SPOON considers bothcontent and contact. We will elaborate the reasons indescribing the trace driven simulation results later on.

    TABLE 3Efciency and cost in the experiments on GENI

    Method Hit Rate Ave. Delay (s) Maintenance Cost Total CostSPOON 0.671 152731.3 258764 275312MOPS 0.629 163282.5 310131 320412CacheDTN 0.5712 219021.4 283210 298123PodNet 0.5932 183621 223218 240238PDI+DIS 0.524 7418.9 298641 359841Epidemic 0.8813 15621.2 669193 860475

    CacheDTN has low hit rate, median cost and highaverage delay. This is because though replicas are cre-ated, queries wait for les passively on their originators,leading to a long delay. Also, the replication of les tonetwork centers incurs a high cost. PodNet has low hitrate for the same reason as CacheDTN. However, sincethe replicas on each node are more catered for the inter-ests of itself and nodes it has met, PodNet has slightlyhigher hit rate than CacheDTN. Moreover, PodNet hasthe lowest cost because nodes in it only replica lesthey are interested in. PDI+DIS generates the lowesthit rate at relatively high total cost and low averagedelay. The low hit rate is caused by the poor mobilityresilience of route tables. As a result, only partial queriesare resolved quickly in the local broadcasting. Otherspassively wait for le holders or updated routes andusually cannot be resolved timely. Then, since we onlycount the average delay of successful queries, PDI+DIShas the lowest average delay.

    TABLE 4Memory usage in the experiments on GENI

    Metric SPOON MOPS CacheDTN PodNet PDI+DIS Epid.Query 36.8 44.1 48.4 45.3 13.1 1998Table 10.4 16.9 50 50 15.6 0

    We also evaluated the performance of each methodin memory utilization in terms of the average num-ber of queries in the buffer (Query) and the aver-age size of a content/neighbor table (Table). The re-sults are shown in Table 4. For the average numberof buffered queries, we nd that PDI+DIS

  • 11

    tries in a table to represent memory usage. The re-sults in Table 4 show that Epidemic

  • 12

    0.55

    0.65

    0.75

    0.85

    0.95

    5 10 15 20 25

    HitR

    ate

    NumberofQueries(x103)

    SPOON MOPS EpidemicPDI+DIS CacheDTN PodNet

    (a) Hit rate.

    0

    10

    20

    30

    40

    50

    5 10 15 20 25

    Ave

    rageD

    elay(x

    103

    S)

    NumberofQueries(x103)

    SPOON MOPS EpidemicPDI+DIS CacheDTN PodNet

    (b) Average delay.

    2

    4

    6

    8

    10

    12

    14

    5 10 15 20 25

    Mai

    nten

    anceC

    ost

    NumberofQueries(x103)

    SPOON MOPSEpidemic PDI+DISCacheDTN PodNet(

    x105

    )

    (c) Maintenance cost.

    2

    7

    12

    17

    22

    5 10 15 20 25

    Tota

    lCos

    t(x1

    05)

    NumberofQueries(x103)

    SPOON MOPSEpidemic PDI+DISCacheDTN PodNet

    (d) Total cost.

    Fig. 4. Performance in the event-driven experiments with Haggle trace.

    0.55

    0.70

    0.85

    1.00

    5 10 15 20 25

    HitR

    ate

    NumberofQueries(x103)

    SPOON MOPS EpidemicPDI+DIS CacheDTN PodNet

    (a) Hit rate.

    0

    10

    20

    30

    40

    50

    5 10 15 20 25

    Ave

    rageD

    elay(x

    104

    S)

    NumberofQueries(x103)

    SPOON MOPS EpidemicPDI+DIS CacheDTN PodNet

    (b) Average delay.

    5

    10

    15

    20

    25

    30

    35

    5 10 15 20 25

    Mai

    nten

    anceC

    ost

    NumberofQueries(x103)

    SPOON MOPSEpidemic PDI+DISCacheDTN PodNet

    (x10

    5 )

    (c) Maintenance cost.

    5

    15

    25

    35

    45

    5 10 15 20 25

    Tota

    lCos

    t(x1

    05)

    NumberofQueries(x103)

    SPOON MOPSEpidemic PDI+DISCacheDTN PodNet

    (d) Total cost.

    Fig. 5. Performance in the event-driven experiments with MIT Reality trace.

    2

    3

    4

    5

    6

    7

    5 10 15 20 25

    Tota

    lCos

    t(x1

    05)

    NumberofQueries(x103)

    SPOON MOPSCacheDTN PodNet

    (a) Haggle trace.

    5

    6

    7

    8

    9

    10

    11

    12

    13

    5 10 15 20 25

    Tota

    lCos

    t(x1

    05)

    NumberofQueries(x103)

    SPOON MOPSCacheDTN PodNet

    (b) MIT Reality trace.

    Fig. 6. Total costs with condence intervals.

    difference and the page limit. We nd that the totalcosts follow Epidemic>PDI+DIS>MOPS>CacheDTN >SPOON>PodNet, which is the same as Figures 4(c)and 5(c). Such a result means that the maintenance costis the majority part of the total cost. With above results,we conclude that SPOON has the highest overall lesearching efciency in terms of hit rate, delay and cost.

    4.4 Evaluation of the Enhancement Strategies4.4.1 Multi-copy forwarding and PrefetchingWe rst evaluated the effect of the multi-copy forward-ing and the intelligent le prefetching. We let Multi-copy forwarding and Prefetching denote the SPOONwith the corresponding improvement strategy, respec-tively, and compare them with the Original SPOON.In Multi-copy forwarding, we let each query originatordistribute two copies of its query. In Prefetching, we leteach ambassador store top ten most popular les. Wevaried the number of queries from 5000 to 25000. Thetest results are shown in Tables 5 and 6.

    We nd that the multi-copy forwarding strategy withonly two copies enhances the hit rate greatly in theexperiments with both traces. This is because wheneach query has two copies in the system, its probabilityof encountering the node containing the queried leincreases. Such a result shows the effectiveness of themulti-forwarding strategy. We also see from the two

    tables that the le prefetching strategy slightly improvesthe hit rate with both the two traces. This is because1) we only congure two ambassadors per community,and 2) the prefetched les only satisfy a small amountof queries since most queries are for contents in lo-cal community. The improvement on the hit rate stilldemonstrates the effectiveness of the le prefetchingstrategy and the improvement would be greater withmore ambassadors and greatly varied le popularity.

    TABLE 5Hit rate improvement with the Haggle trace.

    # of packets Original Prefetching Multi-copy forwarding5000 0.75137 0.75313 0.77941210000 0.75215 0.75633 0.78013515000 0.73831 0.74274 0.77891220000 0.74928 0.75242 0.77432125000 0.74731 0.75201 0.779415

    TABLE 6Hit rate improvement with the MIT Reality trace.# of Packets Original Prefetching Multi-copy forwarding

    5000 0.761371 0.7671675 0.78041310000 0.759941 0.762841 0.77083815000 0.760135 0.763762 0.77284320000 0.756418 0.759957 0.77390125000 0.751835 0.754837 0.771963

    4.4.2 Querying-Completion and Loop-PreventionWe name SPOON without querying-completion asSPOON-OR, SPOON with the querying-completiononly as SPOON-QC, and SPOON with both querying-completion and loop-prevention as SPOON-QCLP. Weset the number of queries to 15000. In SPOON-OR,queries do not stop searching until TTL expiration. Wealso set the maximal number of les each query canretrieve, Smax, to 2. To enable a query to nd two les,we purposely let each le have two copies in the system.In order to alleviate the inuence of the TTL on theevaluation of the querying-completion, we enlarge theTTL to the entire length of the used traces. The testresults are shown in Table 7.

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 13

    We nd from the table that SPOON-QCLP has slightlyhigher hit rate than SPOON-QR and SPOON-QC. This isbecause the loop-prevention avoids forwarding a queryto the same le holder repeatedly, thereby utilizingforwarding opportunities more efciently. SPOON-QChas slightly lower hit rate than SPOON-OR because itstops querying when Smax les are fetched. We alsosee that the number of query forwarding operationsfollow SPOON-OR>SPOON-QC>SPOON-QCLP. This isbecause SPOON-OR does not stop querying until theTTL is expired. SPOON-QC reduces the cost as it stopsquerying after the specied number of les are located.In SPOON-QCLP, the loop-prevention avoids redundantforwarding to the same node, leading to less number offorwarding operations and more efcient le searching.

    TABLE 7Effect of query-completion strategy.

    Trace SPOON-OR SPOON-QC SPOON-QCLPHit Rate

    Haggle 0.8741 0.8701 0.8813MIT Reality 0.9091 0.9012 0.9406

    Number of query forwarding operationsHaggle 569841 530516 304953

    MIT Reality 721852 609863 249132

    4.4.3 Node Churn ConsiderationWe name SPOON with and without a strategy to handlenode churn as SPOON-CH and SPOON-NA, respec-tively. The total number of queries was set to 15000. Theperiod for beacon message was set to 100s and 1000swith the Haggle trace and the MIT Reality trace, respec-tively. In the test, NL nodes leave the system evenlyduring the rst 1/2 and 1/4 of the Haggle trace andthe MIT Reality trace, respectively. NL was varied from5 to 25. We name the node that contains a le matchinga query as the primary matching node for the query. Inorder to demonstrate the performance of SPOON in nodechurn, for each queried le, we purposely created a lethat has 70% similarity with it in a non-leaving node inthe same community with the primary matching node.We name this node as the secondary matching node.

    The test results of voluntary and abrupt normal nodesdeparture are shown in Figure 7. We see that in all cases,when node churn consideration is applied, the hit rateis increased and the average delay is decreased. Thisis because with the node churn consideration, queriesfailing to nd their primary matching nodes (i.e., haveleft the system) are further forwarded to their secondarymatching nodes while when there is no node churnconsideration, these queries just wait on coordinatorsfor the primary matching node, leading to a low hitrate and a high average delay. We also observe thatwhen the number of leaving nodes increases, the hitrate decreases and the average delay increases. Thisis because leaving nodes can no longer relay queriesor departure notication/detection messages, leading tolower hit rate and higher average delay.

    TABLE 8Effect of the detection of coordinator departures.Trace w/o churn w/ churn w/ churn

    consideration consideration-Ab consideration-VoHaggle 0.650643 0.684512 0.729942

    MIT Reality 0.641053 0.677841 0.746841

    We also tested the scenario in which only the coordi-nator nodes leave the system. Since the total number ofcoordinators is limited, we only randomly chose two co-ordinators to leave the system during the test. We namethe scenarios when coordinators depart abruptly andvoluntarily as w/ churn consideration-Ab and w/churn consideration-Vo, respectively. The test resultsare shown in Table 8. We nd that when coordinatorsleave the system, the hit rate of SPOON with node churnconsideration is much higher than that without nodechurn consideration. This is because the coordinator iscritical in both intra- and inter- le searching in SPOON.Without node churn consideration, queries just wait forcoordinators until their TTL expiration if they need to beforwarded to coordinators, leading to a low hit rate anda high average delay. Above results show that SPOONsstrategies for node churn consideration can improve thesystem performance at a low cost.

    5 CONCLUSIONIn this paper, we propose a Social network based P2PcOntent le sharing system in disconnected mObile ad-hoc Networks (SPOON). SPOON considers both nodeinterest and contact frequency for efcient le shar-ing. We introduce four main components of SPOON:Interest extraction identies nodes interests; Communityconstruction builds common-interest nodes with frequentcontacts into communities. The node role assignmentcomponent exploits nodes with tight connection withcommunity members for intra-community le searchingand highly mobile nodes that visit external communi-ties frequently for inter-community le searching; Theinterest-oriented le searching scheme selects forwardingnodes for queries based on interest similarities. SPOONalso incorporates additional strategies for le prefetch-ing, querying-completion and loop-prevention, and nodechurn consideration to further enhance le searching ef-ciency. The system deployment on the real-world GENIOrbit platform and the trace-driven experiments provethe efciency of SPOON. In future, we will explore howto determine appropriate thresholds in SPOON, howthey affects the le sharing efciency, and how to adaptSPOON to larger and more disconnected networks.

    ACKNOWLEDGMENTSThis research was supported in part by U.S. NSF grantsCNS-1249603, OCI-1064230, CNS-1049947, CNS-1156875,CNS-0917056 and CNS-1057530, CNS-1025652, CNS-0938189, CSR-2008826, CSR-2008827, Microsoft ResearchFaculty Fellowship 8300751, and U.S. Department ofEnergys Oak Ridge National Laboratory including theExtreme Scale Systems Center located at ORNL and DoD4000111689. An early version of this work was presentedin the Proceedings of MASS11 [40]

    REFERENCES[1] The state of the smartphone market, http://www.allaboutsymb

    ian.com/news/item/6671 The State of the Smartphone Ma.php.[2] Next Generation Smartphones Players, Opportunities & Fore-

    casts 2008-2013, Juniper Research, Tech. Rep., 2009.

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 14

    0.500.550.600.650.700.75

    Voluntary:SPOONNAVoluntary:SPOONCH

    0.600.650.700.750.80

    36000 18000 12000 9000 7200Periodicaltimeforanodedeparture(s)

    Abrupt:SPOONNAAbrupt:SPOONCH

    HitR

    ate

    (a) Hit rate with Haggle trace.

    60657075808590 Voluntary:SPOONNA

    Voluntary:SPOONCH

    55606570758085

    36000 18000 12000 9000 7200Periodicaltimeforanodedeparture(s)

    Abrupt:SPOONNAAbrupt:SPOONCH

    Ave

    rageD

    elay(x

    103

    S)

    (b) Ave. delay with Haggle trace.

    0.5

    0.6

    0.7 Voluntary:SPOONNAVoluntary:SPOONCH

    0.5

    0.6

    0.7

    120000 60000 40000 30000 24000Periodicaltimeforanodedeparture(s)

    Abrupt:SPOONNAAbrupt:SPOONCHH

    itR

    ate

    (c) Hit rate with MIT trace.

    55

    60

    65

    70

    75

    Voluntary:SPOONNAVoluntary:SPOONCH

    5055606570

    120000 60000 40000 30000 24000

    Periodicaltimeforanodedeparture(s)

    Abrupt:SPOONNAAbrupt:SPOONCH

    Ave

    rageD

    elay(x

    103

    S)

    (d) Ave. delay with MIT trace.

    Fig. 7. Performance with voluntary and abrupt node departures.

    [3] A Market Overview And Introduction to GyPSii,http://corporate.gypsii.com/docs/MarketOverview.

    [4] Bittorrent, http://www.bittorrent.com/.[5] Kazaa, http://www.kazaa.com.[6] M. Papadopouli and H. Schulzrinne, A Performance Analysis of

    7DS: a Peer-to-Peer Data Dissemination and Prefetching Tool forMobile Users, Advances in wired and wireless communications, IEEESarnoff Symposium Digest, 2001.

    [7] A. Klemm, C. Lindemann, and O. Waldhorst, A Special-PurposePeer-to-Peer File Sharing System for Mobile Ad Hoc Networks,in Proc. of VTC, 2003.

    [8] C. Lindemann and O. P. Waldhort, A Distributed Search Servicefor Peer-to-Peer File Sharing, in Proc. of P2P, 2002.

    [9] D. W. A. Hayes, Peer-to-Peer Information Sharing in a MobileAd hoc Environment, in Proc. of WMCSA, 2004.

    [10] J. B. Tchakarov and N. H. Vaidya, Efcient Content Location inWireless Ad Hoc Networks, in Proc. of MDM, 2004.

    [11] C. Hoh and R. Hwang, P2P File Sharing System over MANETbased on Swarm Intelligence: A Cross-Layer Design, in Proc ofWCNC, 2007, pp. 26742679.

    [12] T. Repantis and V. Kalogeraki, Data Dissemination in MobilePeer-to-Peer Networks, in Proc. of MDM, 2005.

    [13] Y. Huang, Y. Gao, K. Nahrstedt, and W. He, Optimizing leretrieval in delay-tolerant content distribution community, inProc. of ICDCS, 2009.

    [14] W. Gao, G. Cao, A. Iyengar, and M. Srivatsa, Supporting cooper-ative caching in disruption tolerant networks. in Proc. of ICDCS,2011.

    [15] J. Reich and A. Chaintreau, The age of impatience: optimal repli-cation schemes for opportunistic networks. in Proc. of CoNEXT,2009.

    [16] V. Lenders, M. May, G. Karlsson, and C. Wacha, Wireless ad hocpodcasting, Mobile Computing and Communications Review, 2008.

    [17] K. Chen and H. Shen, Global optimization of le availabilitythrough replication for efcient le sharing in manets. in Proc.of ICNP, 2011.

    [18] F. Li and J. Wu, MOPS: Providing Content-Based Service inDisruption-Tolerant Networks, in Proc. of ICDCS, 2009.

    [19] P. Costa, C. Mascolo, M. Musolesi, and G. P. Picco, Socially-awareRouting for Publish-Subscribe in Delay-Tolerant Mobile Ad HocNetworks, IEEE JSAC, vol. 26, no. 5, pp. 748760, 2008.

    [20] E. Yoneki, P. Hui, S. Chan, and J. Crowcroft, A Socio-awareOverlay for Publish/subscribe Communication in Delay TolerantNetworks, in Proc. of MSWiM, 2007.

    [21] C. Boldrini, M. Conti, and A. Passarella, Contentplace: Social-aware data dissemination in opportunistic networks, in Proc. ofMSWIM, 2008.

    [22] A. Fast, D. Jensen, and B. N. Levine, Creating social networksto improve peer-to-peer networking, in Proc. of KDD, 2005.

    [23] A. Iamnitchi, M. Ripeanu, and I. T. Foster, Small-world le-sharing communities, in Proc. of INFOCOM, 2004.

    [24] M. Mcpherson, Birds of a feather: Homophily in social net-works, Annual Review of Sociology, vol. 27, no. 1, 2001.

    [25] E. Yoneki, P. Hui, S. Chan, and J. Crowcroft, A socio-awareoverlay for publish/subscribe communication in delay tolerantnetworks, in Proc. of MSWiM, 2007.

    [26] A. Chaintreau, P. Hui, J. Scott, R. Gass, J. Crowcroft, and C. Diot,Impact of human mobility on opportunistic forwarding algo-rithms, IEEE TMC, vol. 6, no. 6, pp. 606620, 2007.

    [27] V. Carchiolo, M. Malgeri, G. Mangioni, and V. Nicosia, Anadaptive overlay network inspired by social behavior, Journalof Parallel and Distributed Computing (JPDC), 2010.

    [28] A. Iamnitchi, M. Ripeanu, E. Santos-Neto, and I. Foster, Thesmall world of le sharing, IEEE TPDS, 2011.

    [29] H. Schtze and C. Silverstein, Projections for Efcient DocumentClustering, in Proc. of SIGIR, 1997, pp. 7481.

    [30] P. Bonacich, Factoring and Weighting Approaches to StatusScores and Clique Identication, J. of Math.Sociol., 1972.

    [31] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Intro-duction to Cluster Analysis. New York, USA: Wiley, 1990.

    [32] E. Daly and M. Haahr, Social network analysis for routing indisconnected delay-tolerant manets, in Proc. of MobiHoc, 2007.

    [33] W. Hsu, T. Spyropoulos, K. Psounis, and A. Helmy, Modelingtime-variant user mobility in wireless mobile networks, in Proc.of INFOCOM, 2007.

    [34] A. Vahdat and D. Becker, Epidemic routing for partially-connected ad hoc networks, Duke University, Tech. Rep., 2000.

    [35] GENI project, http://www.geni.net/.[36] Orbit, http://www.orbit-lab.org/.[37] The network simulator ns-2, http://www.isi.edu/nsnam/ns/.[38] M. Musolesi and C. Mascolo, Designing Mobility Models Based

    on Social Network Theory, ACM SIGMOBILE Comput. and Comm.Rev., 2007.

    [39] N. Eagle, A. Pentland, and D. Lazer, Inferring Social NetworkStructure using Mobile Phone Data, PNAS, vol. 106, no. 36, 2009.

    [40] K. Chen and H. Shen, Leveraging social networks for p2pcontent-based le sharing in mobile ad hoc networks, in Proc.of MASS, 2011.

    Kang Chen Kang Chen received the BS degreein Electronics and Information Engineering fromHuazhong University of Science and Technol-ogy, China in 2005, and the MS in Communica-tion and Information Systems from the Gradu-ate University of Chinese Academy of Sciences,China in 2008. He is currently a Ph.D studentin the Department of Electrical and ComputerEngineering at Clemson University. His researchinterests include mobile ad hoc networks anddelay tolerant networks.

    Haiying Shen Haiying Shen received the BSdegree in Computer Science and Engineeringfrom Tongji University, China in 2000, and theMS and Ph.D. degrees in Computer Engineeringfrom Wayne State University in 2004 and 2006,respectively. She is currently an Assistant Pro-fessor in the Department of Electrical and Com-puter Engineering at Clemson University. Herresearch interests include distributed computersystems and computer networks, with an em-phasis on P2P and content delivery networks,

    mobile computing, wireless sensor networks, and grid and cloud com-puting. She was the Program Co-Chair for a number of internationalconferences and member of the Program Committees of many leadingconferences. She is a Microsoft Faculty Fellow of 2010 and a memberof the IEEE and ACM.

    Haibo Zhang Haibo Zhang received the BS de-gree in Computer Science and Technology fromShanghai Jiao Tong University, China in 2008,and the MS in Computer Science from Universityof Arkansas, Fayetteville in 2010. Hes currentlya Software Engineer in Cerner Corporation thatmakes software for hospitals and medical insti-tutions to support meaningful use of informationsystem.

    IEEE TRANSACTIONS ON MOBILE COMPUTINGThis article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.