13
Resource demand and supply in BitTorrent content-sharing communities Nazareno Andrade a,b, * , Elizeu Santos-Neto b , Francisco Brasileiro a , Matei Ripeanu b a Universidade Federal de Campina Grande, Laboratorio de Sistemas Distribuidos, Av. Aprigio Veloso, 882, Bloco-CO, 58109970 Campina Grande, PB, Brazil b University of British Columbia, Vancouver, BC, Canada article info Article history: Received 21 April 2008 Received in revised form 9 September 2008 Accepted 24 September 2008 Available online 21 November 2008 Keywords: Content distribution BitTorrent Workload characterization Resource sharing abstract BitTorrent is a widely popular peer-to-peer content distribution protocol. Unveiling pat- terns of resource demand and supply in its usage is paramount to inform operators and designers of BitTorrent and of future content distribution systems. This study examines three BitTorrent content-sharing communities regarding resource demand and supply. The resulting characterization is significantly broader and deeper than previous BitTor- rent investigations: it compares multiple BitTorrent communities and investigates aspects that have not been characterized before, such as aggregate user behavior and resource contention. The main findings are three-fold: (i) resource demand – a more accurate model for the peer arrival rate over time is introduced, contributing to work- load synthesis and analysis; additionally, torrent popularity distributions are found to be non-heavy-tailed, what has implications on the design of BitTorrent caching mecha- nisms; (ii) resource supply – a small set of users contributes most of the resources in the communities, but the set of heavy contributors changes over time and is typically not responsible for most resources used in the distribution of an individual file; these results imply some level of robustness can be expected in BitTorrent communities and directs resource allocation efforts; (iii) relation between resource demand and supply users that provide more resources are also those that demand more from it; also, the distribution of a file usually experiences resource contention, although the communities achieve high rates of served requests. Ó 2008 Elsevier B.V. All rights reserved. 1. Introduction Four aspects must be analyzed to understand a compu- tational system and improve its performance: its design, its implementation, the resources on which it runs, and the workload it serves. The first two aspects directly determine the efficiency of the system, while the latter two bound sys- tem performance and the efficacy of resource allocation mechanisms. Moreover, while the design and implementa- tion of a system can be analyzed in controlled conditions, a characterization of typical workload and resource availabil- ity normally requires monitoring in production settings. This study focuses on advancing the characterization of workload and resource availability of commons-based con- tent distribution based on BitTorrent, a widely popular, peer-to-peer content distribution protocol. Our analysis of the data collected from three BitTorrent communities leads to the understanding of the resources on which peer-to-peer content distribution mechanisms typically run and the workload they serve. We interpret the resources on which BitTorrent runs as its resource supply and the workload it serves as its resource demand. In this perspective, this study investigates the im- pact of resource supply and demand on BitTorrent’s perfor- mance, on its resource allocation mechanisms, and on the overhead it imposes on the underlying network infrastruc- ture. The resulting analysis is relevant: (i) for users, as it 1389-1286/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2008.09.029 * Corresponding author. Address: Laboratório de Sistemas Distribuídos, Av. Aprígio Veloso, 882, Bloco CO, CEP 58429-900, Campina-PB, Brazil. Tel.: +55 83 33101365. E-mail addresses: [email protected], [email protected] (N. Andrade), [email protected] (F. Brasileiro). Computer Networks 53 (2009) 515–527 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet

Resource demand and supply in BitTorrent content-sharing

  • Upload
    newbu

  • View
    950

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Resource demand and supply in BitTorrent content-sharing

Computer Networks 53 (2009) 515–527

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/ locate/comnet

Resource demand and supply in BitTorrent content-sharing communities

Nazareno Andrade a,b,*, Elizeu Santos-Neto b, Francisco Brasileiro a, Matei Ripeanu b

a Universidade Federal de Campina Grande, Laboratorio de Sistemas Distribuidos, Av. Aprigio Veloso, 882, Bloco-CO, 58109970 Campina Grande, PB, Brazilb University of British Columbia, Vancouver, BC, Canada

a r t i c l e i n f o a b s t r a c t

Article history:Received 21 April 2008Received in revised form 9 September 2008Accepted 24 September 2008Available online 21 November 2008

Keywords:Content distributionBitTorrentWorkload characterizationResource sharing

1389-1286/$ - see front matter � 2008 Elsevier B.Vdoi:10.1016/j.comnet.2008.09.029

* Corresponding author. Address: Laboratório de SAv. Aprígio Veloso, 882, Bloco CO, CEP 58429-900Tel.: +55 83 33101365.

E-mail addresses: [email protected], naza(N. Andrade), [email protected] (F. Brasileiro).

BitTorrent is a widely popular peer-to-peer content distribution protocol. Unveiling pat-terns of resource demand and supply in its usage is paramount to inform operators anddesigners of BitTorrent and of future content distribution systems. This study examinesthree BitTorrent content-sharing communities regarding resource demand and supply.The resulting characterization is significantly broader and deeper than previous BitTor-rent investigations: it compares multiple BitTorrent communities and investigatesaspects that have not been characterized before, such as aggregate user behavior andresource contention. The main findings are three-fold: (i) resource demand – a moreaccurate model for the peer arrival rate over time is introduced, contributing to work-load synthesis and analysis; additionally, torrent popularity distributions are found tobe non-heavy-tailed, what has implications on the design of BitTorrent caching mecha-nisms; (ii) resource supply – a small set of users contributes most of the resources inthe communities, but the set of heavy contributors changes over time and is typicallynot responsible for most resources used in the distribution of an individual file; theseresults imply some level of robustness can be expected in BitTorrent communities anddirects resource allocation efforts; (iii) relation between resource demand and supply –users that provide more resources are also those that demand more from it; also, thedistribution of a file usually experiences resource contention, although the communitiesachieve high rates of served requests.

� 2008 Elsevier B.V. All rights reserved.

1. Introduction

Four aspects must be analyzed to understand a compu-tational system and improve its performance: its design, itsimplementation, the resources on which it runs, and theworkload it serves. The first two aspects directly determinethe efficiency of the system, while the latter two bound sys-tem performance and the efficacy of resource allocationmechanisms. Moreover, while the design and implementa-tion of a system can be analyzed in controlled conditions, a

. All rights reserved.

istemas Distribuídos,, Campina-PB, Brazil.

[email protected]

characterization of typical workload and resource availabil-ity normally requires monitoring in production settings.

This study focuses on advancing the characterization ofworkload and resource availability of commons-based con-tent distribution based on BitTorrent, a widely popular,peer-to-peer content distribution protocol. Our analysisof the data collected from three BitTorrent communitiesleads to the understanding of the resources on whichpeer-to-peer content distribution mechanisms typicallyrun and the workload they serve.

We interpret the resources on which BitTorrent runs asits resource supply and the workload it serves as its resourcedemand. In this perspective, this study investigates the im-pact of resource supply and demand on BitTorrent’s perfor-mance, on its resource allocation mechanisms, and on theoverhead it imposes on the underlying network infrastruc-ture. The resulting analysis is relevant: (i) for users, as it

Page 2: Resource demand and supply in BitTorrent content-sharing

516 N. Andrade et al. / Computer Networks 53 (2009) 515–527

evaluates the quality of service currently available incommons-based content distribution communities; (ii)for community operators, since it characterizes the re-sources on which the community depends and, in particu-lar, the effect of an increasingly popular incentivemechanism called sharing-ratio enforcement; (iii) for devel-opers of content distribution technologies, as it documentsusage patterns which affect resource allocation mecha-nisms; and (iv) for Internet infrastructure operators, as de-mand and supply patterns define the load contentdistribution poses on the network and how effective it isto cache content to reduce operational costs.

The traces collected and characterized in this work al-low us to draw a clearer picture of BitTorrent communitiesthan previous studies [19,25,5,16,31,3,27]. We are able,first, to compare system behavior across different commu-nities, and second, to accurately model user behavior, assome of our traces allow precise user identification acrossall activities a user may engage in a community. Accuratelytracing user behavior enables us to characterize the systemat a community level more precisely than previous studiesthat focused on individual files or used tentative user iden-tification. Additionally, we discuss limitations of currentpractices used by previous BitTorrent measurement stud-ies, and document solutions to limit the shortcomings ofthese methodologies.

In summary, this paper extends previous characteriza-tion studies in terms of breadth, as it compares systembehavior in several large communities (the largest havingmore than 80,000 active users performing 1.7 milliondownloads during our trace); depth, as it investigates novelaspects of demand and supply, such as user behavioracross torrents and resource contention; and accuracy, asit improves the methodology used by previous studies.

At a high level, our study pictures BitTorrent communi-ties as content distribution systems which generally expe-rience resource contention, but often operate on anabundance of resources, and which rely on contributionsfrom a minority of users. Such minority, however, is notcomposed of altruistic participants: users that contributemore to the communities are also those that request more.Furthermore, the set of major contributors is not static:heavy contributors typically have this status for a limitedtime. Finally, communities successfully serve the vastmajority of the received requests.

In more detail, this study finds that, from the resourcedemand perspective, (i) file popularity in BitTorrent com-munities deviates significantly from the long-tailed itempopularity distribution on the Web, with direct implica-tions on the design of caching mechanisms for BitTorrenttraffic; and (ii) the request arrival rate for a file is not com-prehensively modeled by previous proposals, what leadsus to provide a more accurate model for the arrival rateof requests.

From the resource supply perspective, our main find-ings are: (i) a few users contribute a majority of resourcesat the community level, yet, at the individual data item le-vel, contributions are considerably less concentrated; (ii)peers which contribute more to the system are those thatdevote more bandwidth to the system, and not those thatdevote more time to distribute a file, and (iii) sharing-ratio

enforcement, a popular incentive mechanism deployed inBitTorrent communities, leads to users investing moretime contributing to the community, but not to higherbandwidth allocations.

Investigating the relationship between supply and de-mand shows that: (i) in the community we can gauge thecorrelation between users’ demand and contribution, usersthat contribute more to the community are also those thatconsume more from it, an observation that denotes a de-gree of equity to this community; (ii) in all communitiesstudied, a high proportion of the requests is successfullyserved, which is evidence that commons-based contentdistribution provides a good level of quality of serviceand that traditional content providers could reduce datadistribution costs if they are able to leverage users’ contri-butions at similar levels; and (iii) resource contention var-ies significantly even within a single community: for threequarters of the files distributed, there is at least a mild con-tention for resources and the provision of more resourcescould improve quality of service; for the remaining quar-ter, there are enough resources available to meet thedemand of all peers indistinctly, rendering prioritization-based incentive mechanisms irrelevant.

The rest of this article is organized as follows. The nextsection presents an overview of BitTorrent together withrelated work. Section 3 details the communities studiedand our data collection and analysis method. The charac-terizations of resource demand, supply and the relation be-tween them are presented in Sections 4–6, respectively.The last section brings our conclusions and final remarks.

2. Background

The main goal of BitTorrent is to enable scalable contentdistribution. To this end, the load of distributing a file isshared between the content publisher and those whodownload it: the peers downloading and those which havealready downloaded the file supply bandwidth, the parts ofthe file they already have, and content availability. Thisscheme is currently widely popular: studies estimate thatabout 30% of Internet traffic was due to BitTorrent in2005 [10].

In BitTorrent parlance, a torrent is the group formed byall peers taking part in the distribution of a file. To down-load a file using BitTorrent, a user must join the torrentthat distributes it by contacting its tracker, the componentof the system that enables peer discovery and data loca-tion. Peers that have an incomplete copy of the file arecalled leechers, while peers that have finished downloadingand still participate in the torrent are called seeders. Lee-chers both upload and download pieces of the file, whileseeders only upload them. Both leechers and seeders re-port their progress periodically to the tracker. BitTorrenthas a built-in incentive mechanism through which eachleecher prioritizes the leechers that provide it the best re-cent download speed. Seeders do not share the same built-in incentive mechanism, as they only upload [13].

Note that the content discovery and access controlmechanisms are external to the BitTorrent protocol, whichis focused on data transfer. These functionalities are usu-ally provided through a web site. This solution leads to

Page 3: Resource demand and supply in BitTorrent content-sharing

N. Andrade et al. / Computer Networks 53 (2009) 515–527 517

segmented BitTorrent communities, centered around thedifferent sites that enable content location.

Studies based on modeling [26,35,16], simulation [7,32]and experiments with BitTorrent software [20] established,under controlled conditions, that BitTorrent is an efficientand scalable content distribution protocol. Although thesestudies based on controlled scenarios help understandBitTorrent behavior, a comprehensive characterization ofBitTorrent in production scenarios is necessary to comple-ment them.

Several studies of real world deployments provide valu-able information on this perspective [19,25,5,16,31,3,27,23]. However, these studies suffer from four shortcom-ings which motivate our work: (i) they are unable to accu-rately analyze aggregate user behavior at a communitylevel – due to inaccurate user identification in the collecteddata, (ii) their study of the relationship between resourcedemand and supply is limited, (iii) they are restricted inscope, as they analyze either a few torrents [19,31], a singlecommunity [5,16,2] or a single snapshot of multiple com-munities [3,27], and (iv) they have methodological limita-tions related to the assessment of information loss or biasin the sampling methods used.

This work addresses these issues by (a) obtaining andanalyzing a trace from a community which provides stronguser identification, (b) broadening the scope of the charac-terization to three different communities with more than10,000 torrents and one million downloads, (c) discussingin depth the implications of the sampling methods used,and (d) analyzing the relationship between resource de-mand and supply. The traces which address points (a)and (b) and our approach to address point (c) are presentedin Section 3. Our subsequent BitTorrent characterizationaddresses point (d). Throughout the rest of the document,we compare in detail the results of our study with relatedwork.

3. The data sets

This section presents the terminology used in the rest ofthe paper, the data collection method, and the three Bit-Torrent communities studied. Analyzing multiple commu-nities is necessary as user behavior tends to vary acrosscommunities [3,27]. Although studying the three selectedcommunities does not guarantee a definitive view overuser behavior, we claim that analyzing multiple communi-ties and a larger set of torrents does contribute towards abetter characterization.

3.1. Terminology

For the rest of this document, we use the following ter-minology: we differentiate between users and peers. A useris a participant in a BitTorrent community, which is ob-served as a peer in each torrent she participates. This dis-tinction is relevant because for some of the communitiesstudied, it is only possible to observe accurately peerbehavior.

A peer joins a torrent the first time it participates in it.Each peer might have several sessions in the same torrent,

as it may go offline and come back online, and it leaves atorrent when it departs from the torrent and does notcome back. The time a peer spends online after it finishesthe download and before it leaves the torrent is the peer’sseeding time. The torrent start is determined by the firstpeer join event in that torrent. The torrent end is the timewhen the last peer leaves the torrent. The lifetime of thetorrent is the period between its start and its end, and atorrent is complete if its start and end happen within ourmeasurement period.

We consider a two-level view of a BitTorrent commu-nity: the community level view characterizes the behaviorof users across all torrents they participate in and aggre-gates metrics over all torrents in the community. The tor-rent level is concerned with peer behavior in each torrent,without aggregating this behavior to observe users. Thecommunity level and torrent level views offer complemen-tary view of the community: the former informs observingusers across different torrents, while the latter observesprimarily torrents. Also, this distinction is necessary inour data analysis, as for some of the communities studied,it is not possible to accurately track user behavior. In thesecommunities, we focus on analyzing resource demand andsupply at the torrent level.

3.2. The communities

The three BitTorrent communities studied are etree, bit-soup and alluvion. etree (http://bt.etree.org) is a communitydevoted to sharing recordings of live performances fornon-commercial purposes; alluvion (http://www.allu-vion.org) is a community hosting user-generated media; fi-nally, bitsoup (http://www.bitsoup.org) is a community ofusers that share all kinds of content.

Two features distinguish bitsoup from the other twocommunities. First, it requires users to register with thecommunity website and tracks user behavior across tor-rents. Second, it uses sharing-ratio enforcement (SRE) inaddition to BitTorrent’s built-in incentive mechanism toboost resource contribution. SRE, also used by other com-munities (e.g. http://www.nhltorrents.co.uk), works bykeeping a record of users’ resource consumption and con-tribution across different torrents and penalizing the userswhich do not contribute a minimum proportion of theirconsumption across all torrents where they participate.

3.3. Data collection

This study uses a passive method to collect BitTorrenttraces. Data collection is done via crawling report pagesprovided by the trackers of each community, instead ofdeploying software on client machines. These reports con-tain detailed information, per torrent, about all peers cur-rently active in the system, such as peer’s downloadedand uploaded amounts, for how long the peer is onlineand whether the peer is a seeder.

Each crawling consists of thousands of HTTP requests tothe community’s web server. Thus, the frequency of datacollection must be moderate to minimize the crawlingoverhead. The crawler executed every hour (Section 3.4discusses the limitations of this sampling frequency in this

Page 4: Resource demand and supply in BitTorrent content-sharing

Table 2Characteristics of torrent samples considered.

Sample Torrents Peers

All s8 s30 All s8 s30

alluvion 1247 271 355 187,916 12,291 43,930bitsoup 10,463 416 1123 1,351,806 8400 54,889etree 284 124 – 11,788 1764 –

518 N. Andrade et al. / Computer Networks 53 (2009) 515–527

study). We experimented with snapshots every 30 min, butthe resulting load was seen as too high by communities’administrators.

To collect data from bitsoup and etree, we implementedour own crawlers, while the alluvion data is available at theUMass Trace Repository (http://traces.cs.umass.edu/). It isworth noting that although the data from alluvion has beenanalyzed before [5,16], we perform our analysis with animproved methodology and analyze new dimensions of it.

Table 1 presents, for each community studied, the dura-tion of the trace, the total number of torrents observed ineach trace, the average number of torrents alive at anypoint in time, the total number of peers seen in the traceand the average number of peers seen at any point in time.The bitsoup community is considerably larger than theother two studied. Nevertheless, as we show throughoutthe rest of the paper, this does not cause major differencesin the properties of user behavior we consider.

3.4. Reconstructing torrent dynamics

Our traces consist of hourly snapshots of the state ofpeers and torrents in a community. Before analysis, it isnecessary to reconstruct peer and torrent behavior overtime from the snapshots. This reconstruction process im-plies three challenges, which we discuss next. Please referto our technical report [4] for further details about themethods presented here.

The first challenge in analyzing the traces is the use ofimprecise identification by some trackers, an issue dis-cussed by previous studies [5,16,31,19]. This study hasthe advantage that one of the communities studied, bit-soup, uses unique logins to track user behavior. This allowsus to estimate precisely the distribution of resource contri-bution and consumption by users at the community level.User identification is imprecise in alluvion and etree, allow-ing only heuristic-based identification of peers at a torrentlevel. For these two communities, we used heuristics simi-lar to those reported in previous studies [5,16,19] to trackpeer behavior given the imprecise identification found inthe traces. For the sake of reproducibility, these heuristicsare detailed in our technical report.

A second difficulty arises from the crawling frequency.As snapshots are taken periodically, information is lostif the rate with which relevant events occur is higher thanthe sampling period. To address this issue, we estimate theinformation loss and bound it by studying only torrents inwhich enough observations of the relevant events areavailable. For that, we determine the likelihood of a peerto join a torrent, download a file and leave the torrent be-tween two snapshots. This likelihood is a function of the

Table 1Characteristics of the traces used.

Trace Duration To

To

etree 10 days during March 2005 15alluvion 50 days during October–December 2003 15bitsoup 68 days during April–July 2007 13

time peers must stay in a torrent to download the file.The time necessary to download the file is derived fromthe size of the file distributed in the torrent and the distri-bution of peers’ download bandwidth. Using this result, wehave estimated the amount of information loss for torrentsdistributing files of different sizes and found that by ana-lyzing only torrents that distribute files larger than100 MB, it is possible to observe at least 90% of the peersin the communities studied. For the remainder of thisstudy, therefore, we consider only these torrents in ourtraces.

The third complication in analyzing the data sets resultsfrom the limited trace duration. For some analysis, such asthe characterization of the request arrival process in tor-rents, it is necessary to examine a sample of complete tor-rents. Moreover, it is desirable that this sample reflects theoverall population of torrents in the community. However,because data from each community is collected for a lim-ited period (up to 68 days for bitsoup), one must take carewhen sampling the complete torrents so as to produce anunbiased sample. The reason is that considering only com-plete torrents will admit proportionally more short tor-rents than exist in the torrent population, as these have ahigher probability of occurring in the trace than longertorrents.

To avoid bias when studying complete torrents, we ap-ply the create-based method proposed by Roselli et al. [28],which allows obtaining an unbiased sample of torrentswith a maximum duration s. We obtained samples of tor-rents for s ¼ 8 days from the three communities and sam-ples of torrents for s ¼ 30 days for alluvion and bitsoup, thetwo communities for which we have longer traces. In therest of this paper these samples are referred as s8 ands30, respectively.

Table 2 details the samples we consider given ourmethod.

4. Characterizing resource demand

The first part of this characterization focuses on the de-mand generated by members of a BitTorrent community.Understanding usage patterns is paramount to produce

rrents Peers

tal Average Total Average

89 835 66,588 490528 278 227,096 7312,741 6633 1,694,243 145,462

Page 5: Resource demand and supply in BitTorrent content-sharing

100

101

102

103

104

100 101 102 103 104 105 106

Popu

larit

y

Torrent popularity rank

alluvionbitsoup

etree

100

101

102

103

104

100 101 102 103 104 105 106

Popu

larit

y

Torrent popularity rank

alluvionbitsoup

Fig. 1. Popularity rank of all torrents in the traces (left) and of torrents in the s30 sample (right).

N. Andrade et al. / Computer Networks 53 (2009) 515–527 519

optimized designs of content distribution mechanisms. Weconsider torrent joins (i.e., file requests) as indirectlyexpressing user demand and focus on the following twoquestions: (i) what is the distribution of torrent popularity,as expressed by total number of torrent joins each torrentreceives; and (ii) what is the evolution of the rate of torrentjoins over time.

4.1. What is the content popularity distribution?

The popularity of a torrent (and implicitly of a contentitem) is the number of torrent joins received during a timeinterval. From our data, we can see both the popularity ofall torrents during the duration of the traces and the pop-ularity of the complete torrents sampled. The formershows how the interest of users is distributed over avail-able content in a period, while the latter is concerned withthe total number of users which will join a torrent. Regard-less of perspective, our main finding is the same: contentpopularity in BitTorrent communities is not heavy-tailed.

Fig. 1 shows the popularity of all content during the en-tire period of our measurements (left) and restricted tocomplete torrents on s30 (right). The popularity distribu-tion of torrents in the s8 sample have the same character-istics. For all samples, the curves have similar shapes andclearly deviate from a Zipf distribution, commonly referredto when modeling popularity.

For all our samples, a Lognormal or a Weibull distribu-tion fits the empirical data well.1 These distributions aredistinct both from those found in peer-to-peer file-sharing[15,29] and video streaming [11], but are similar with theobserved distribution of user activity across topics in fouronline peer production systems observed by Wilkinson[34] and with the popularity of films as measured by theirbox office revenues [30].

A distinguishing feature in the distributions we observeis that they are not heavy-tailed. The absence of a heavy-tail has major implications on the design of cachingmechanisms. On the one side, for small cache sizes, thesepopularity distributions lead to caches that are less effec-tive (i.e., lower hit ratio) than for heavy-tailed distribu-tions. On the other side, for large cache sizes, the cacheeffectiveness can be much higher than for heavy-tailed dis-

1 We used QQ-plots to compare empirical and theoretical CDFs as visualtests of goodness of fit.

tributions, since the percentage of unpopular items ismuch lower in the trace we observe.

This stands in contrast with the design of caches forWeb pages, whose popularity distribution is heavy-tailed[14,8] and suggests that caching mechanisms designedfor heavy-tailed distributions observed in peer-to-peerfile-sharing (e.g. [33,29]) should be revisited before beingapplied to BitTorrent traffic. We note that Belissimo et al.documented that the popularity distribution of files devi-ates from a Zipf distribution in the alluvion data we con-sider [5]. Our results expand this observation through awider sample that includes three different content-sharingcommunities and suggest distributions which fit the data.

4.2. How are torrent joins distributed over time?

A second dimension of user demand is revealed by thedistribution of joins over the torrents’ lifetimes. Our char-acterization (i) reproduces previous results which showthat the join rate for a torrent drops rapidly after its startand (ii) proposes a model for the evolution of join rate overtime that is more accurate than current state-of-the-art.

Previous studies report that the request arrival rate for atorrent decreases rapidly over time after its start [5,25,16].Guo et al. [16] report, based on the same alluvion trace weuse, that the request arrival rate for a torrent decreasesexponentially over time. We revisit their findings usingall three traces we have collected and benefit from the in-creased accuracy offered by the bitsoup trace which con-tains accurate peer identification.

Examining the curve of peer joins per day, we observethat although an exponential function kðtÞ ¼ ae�t=b is ableto accurately model a number of torrents, it fails to accountfor a longer tail of joins that appears in a large number oftorrents. This effect is particularly perceivable in bitsouptorrents.

We find that a function of the form k0ðtÞ ¼ a0=ð1þ btÞ,for t 2 N better models a larger proportion of all torrentsin bitsoup while modeling torrents in alluvion and etreesimilarly to the exponential function. As in the exponentialmodel, a0 represents the initial peer join rate and b is a fac-tor that influences how fast this rate drops with time. Dif-ferently from kðtÞ, however, in k0ðtÞ the arrival ratedecreases slower and at different rates during the lifetimeof the torrent. The difference in the two models is furtherillustrated in Fig. 2.

Page 6: Resource demand and supply in BitTorrent content-sharing

0 5 10 15

15

2010

050

0

Torrent age (days)

Dai

ly a

rriva

l rat

e Measuredλ(t)λ'(t)

0 5 10 15 20

15

2010

050

0

Torrent age (days)0 5 10 15 20

15

5050

0

Torrent age (days)

Fig. 2. Fitting of kðtÞ and k0ðtÞ for three example torrents. The two torrents on the left are from bitsoup, while the rightmost one is from alluvion.

520 N. Andrade et al. / Computer Networks 53 (2009) 515–527

A comparison between the kðtÞ and k0ðtÞ models can bemade through the difference in their Akaike’s InformationCriterion (AIC). This criterion quantifies the fit of an esti-mated statistical model and can be used to compare howwell two models fit a dataset. Depending on the differencebetween the AIC for the two models, it is possible to assesstheir relative merits [9]. This comparison can result in con-sidering both models to be adequate, or in evidence in fa-vor of the use of one of them. We take a conservativeapproach and consider that a model can be used unlessthere is essentially no support for it in comparison withthe competing model. It is then possible to verify how of-ten each model can be used to model the torrents in ourtraces, considering that it can be replaced by the compet-ing model.

Table 3 summarizes the comparison for the torrents inour traces that lasted for at least 5 days and had a mini-mum of 10 peers. A small fraction (5–10%) of the torrentsin each trace fits neither of our two models and was not in-cluded in the table. For the remainder, no model is themost adequate for all torrents and models have a similarcoverage for torrents in etree and alluvion. Nevertheless,the k0ðtÞ model fits considerably better the bitsoup traceparticularly for the most popular torrents.

One possible explanation for the difference in modeladequacy is the scale of the bitsoup community. Bitsoup issignificantly larger than the other two communities, whichmight result in peers joining torrents for longer periods. Itis also possible that the heuristic peer identification em-ployed in the analysis of etree and alluvion influences thepeer joins observed in these communities, but our datadoes not allow an evaluation of this potential influence.

Table 3Comparison of the percentage of torrents in which each model wasequivalent or better than the competing mode. The percentage in the kðtÞcolumn states the fraction of all torrents in which the kðtÞ model was asgood as or better than the k0ðtÞ model.

Sample # torrents kðtÞ (%) k0ðtÞ (%)

etree s8 27 100 100alluvion s30 194 67 65bitsoup s30 858 40 79

joins < 50 406 45 8250 6 joins 6 150 430 37 75joins P 150 22 9 91

In spite of the cause of the difference, however, our re-sults support the k0ðtÞ model as a valuable tool both whenreasoning about torrents and when synthesizing work-loads. When reasoning, it offers a complement to theexponential model, better explaining a number of torrentsand accounting for the phenomenon of longer tails in peerjoin rates. For workload synthesis, it offers an accuraterepresentation of how the popularity of a torrent de-creases over time. Moreover, the k0ðtÞ model is particu-larly suitable for modeling highly popular torrents, atype of torrent that is often of interest in performanceevaluation.

Finally, the observed model has impact on simulationstudies. The decrease in peer join rates over time impliesthat a pure Poisson process does not model the peer joinprocess well. Several studies (e.g. [21,26,24,12]) have re-lied on this model and our result strengthens the needfor reconsidering them with more accurate models.

5. Characterizing resource supply

In a BitTorrent community, resources are contributedby users. Their actions determine resource supply as users(i) configure the maximum amount of bandwidth their cli-ent can use for upload, (ii) determine the seeding time oftheir clients by controlling when how long the client staysonline after it has finished a download, and (iii) decide todefinitely quit torrents they are seeding, thus stopping tocontribute to them.

This user behavior influences three aspects of the sys-tem: (a) throughput, (b) content durability, and (c) uploadvolume. Users contribute to system throughput by provid-ing upload bandwidth. Content durability is influenced byusers’ seeding time. Finally, the upload volume is the resultof both providing upload bandwidth and spending time onlinewhen there is demand for service.

The remaining of this section evaluates the contributionusers make to each of the aforementioned aspects of thesystem (Sections 5.1 and 5.2); investigates whether it isthe upload bandwidth or the seeding time that betterdetermines the upload volume at both community and tor-rent levels (Section 5.3); and finally investigates thedynamics of the population of contributors in the system(Section 5.4).

Page 7: Resource demand and supply in BitTorrent content-sharing

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cum

ulat

ive

prop

ortio

n o

f con

tribu

tion

Cumulative proportion of users

upload bandwidthupload volume

seeding time

Fig. 3. Community level concentration of contributions in bitsoup ðs30Þ.

N. Andrade et al. / Computer Networks 53 (2009) 515–527 521

5.1. How are user contributions distributed?

Our analysis of user contribution shows that, at thecommunity level, contributions of all types are concentratedon a small portion of contributors, yet, at the torrent level,contributions are typically less concentrated.

The first part of this analysis investigates the distribu-tion of user contribution at the community level. Suchanalysis is only possible in the bitsoup trace, since it isthe only trace with accurate user identification. Fig. 3shows that contributions are considerably concentratedin this trace: the top 20% contributors provide around80% of the contribution for all the types of contributionsconsidered. We note that although a minority of all usersprovide a significant share of resources, this share is muchlower than that reported in previous studies of Gnutella[1,18] and eDonkey [17], two other peer-to-peer file-shar-ing systems.

However, a high concentration at the community leveldoes not imply that contribution is also concentrated atthe torrent level. The investigation of this aspect uses thesamples of complete torrents from all communities. We fo-cus on the top 20% contributors in each individual torrentand measure the amount they contribute. Fig. 4 shows theCDF of the proportion of contribution generated by peers inthe top 20% set: the proportion y of all torrents have x or

0 0.2 0.4 0.6 0.8

1

0 0.2 0.8 1

P[X

< x]

Proportion of contribution by top 20% contributors

bitsoup tau30

upload bandwidthupload volume

se

0 0.2 0.4 0.6 0.8

1

0 0.2Proportion of co

top 20% co

alluvion

Fig. 4. Torrent level concentr

less of their resources contributed by the peers in theirtop 20% set.

The concentration of contribution is less pronounced atthe torrent level than at the community level in all samplesof complete torrents. While at the community level, theusers that are the 20% top contributors are responsiblefor 80% or more of the total seeding time, upload band-width and upload volume; at the torrent level, the peersthat represent the top 20% of the contributors are respon-sible for a similar proportion of all contribution only in asmall number of torrents. In particular, when consideringthe upload volume, the top 20% contributors only rarelycontribute the majority of the uploaded volume. Uploadbandwidth and seeding time are usually more concen-trated; yet it is not as concentrated as at the communitylevel.

Focusing the analysis on the upload volume metric, it ispossible to gain further insight on how resources are pro-vided: the considerable concentration of upload volumecontributions at the community level and the milder con-centration at the torrent level together suggest that topcontributing users do not achieve this status as a resultof massive contributions in a small number of torrents. In-stead, these users contribute in a large number of torrentsover time.

On the one hand, understanding that a small proportionof users are responsible for most of the data transferredmotivates communities to highly value these users,rewarding them so as to maintain their participation. Onthe other hand, the relative balance in resource contribu-tion at the torrent level can be useful regarding resourceallocation in the community. For example, this informationis useful when deciding, either centrally or by collective ac-tion, in which torrents each user should seed. The lack ofconcentration in contributions at the torrent level impliesthat the number of peers contributing in a torrent is a rea-sonable indicator for the level of contribution to be ex-pected in that torrent. Our results suggest resourceallocation at the community level can use this simpleand cheap-to-obtain information to decide where to directresources.

Furthermore, the lack of an accentuated concentrationin resource contribution at the torrent level improves theoverall robustness of torrents. The more the contribution

eding time

0.8 1ntribution by ntributors

tau30

0 0.2 0.4 0.6 0.8

1

0 0.2 0.8 1Proportion of contribution by

top 20% contributors

etree tau8

ation of contributions.

Page 8: Resource demand and supply in BitTorrent content-sharing

522 N. Andrade et al. / Computer Networks 53 (2009) 515–527

is concentrated on a torrent, the more the service level inthat torrent depends on the individual behavior of a fewpeers. More equitable contribution distribution leads totorrents that are more robust to individual peer failure ordeparture.

5.2. Do contribution levels vary across communities?

To examine regularities and peculiarities in user behav-ior, it is necessary to compare the three types of peer con-tributions across the communities. Recall that bitsoupoperates a sharing-ratio enforcement mechanism (SRE)seeking to boost resource contributions. Our investigationreveals that users seed for longer in the community that usesthis enforcement mechanism, although bandwidth contribu-tion is similar across all communities.

The comparison of the three communities focuses oncontributions at the torrent level (a community level com-parison among the three communities is not possible sincethe upload volume is directly related to the size of thecommunity). Upload bandwidth and seeding time are com-parable if they do not correlate with torrent characteristics.Otherwise, observed differences in contributions might re-sult from torrents’ peculiarities.

A correlation analysis shows that the upload bandwidthallocated by a peer is unrelated to the characteristics of thefile being downloaded. Additionally, seeding time is notcorrelated with the size of the file or the time the peertakes to download it. This indicates that it is reasonableto analyze peers’ upload bandwidth and seeding timeregardless of the torrent they participate in and to comparethem across communities.

Fig. 5 shows the distribution of seeding time and uploadbandwidth contributed by peers. Regarding the uploadbandwidth, in all communities, about only 5% of peers con-tribute large amounts of bandwidth (over 100 KB/s), 20–30% of peers do not contribute, and 40% contribute be-tween 5 and 50 KB/s. Considering the seeding time, how-ever, it is clear that a considerably larger proportion ofusers seed for longer in bitsoup than in etree and alluvion.We attribute this difference to the SRE mechanism em-ployed in bitsoup, which encourages users to contribute

0

0.2

0.4

0.6

0.8

1

1 10 100 1000Cum

ulat

ive

prop

ortio

n of

see

ders

Seeding time (h)

alluvionbitsoup

etree

0 0.2 0.4 0.6 0.8

1

1 10 100 1000

Fig. 5. CDFs of seeding time and upload bandwidth prov

more. This conjecture is further supported by observationsmade on a different set of communities by Andrade et al.[3] and Ripeanu et al. [27].

Taken together, the observations that bandwidth levelsare similar across all communities while seeding is higherin the presence of SRE brings a new perspective on the ef-fects of this mechanism. Our analysis unveils that userstypically try to increase their contribution levels by seed-ing for longer and not by providing more bandwidth tothe system. This observation is of particular importancefor designers and operators of incentive mechanisms forBitTorrent, as it provides evidence of which resource userswill invest as a response to incentives to increase their up-load volumes.

Finally, regardless of the difference in seeding behavioracross communities, the seeding time distribution is con-siderably skewed in all of them. This regularity supportsthe conjecture of Powelse et al. [25] that the behavior ofindividual users is more relevant to determine the longev-ity of a torrent than the number of seeders online at an in-stant: a file is likely to be available for longer if there is along-standing seeder than if there are several short-livedones at a given moment.

5.3. What determines the upload volume: seeding time orbandwidth?

This section investigates what factor is determinant inexplaining the upload volume of a user at the communitylevel and of a peer at the torrent level. The goal is to findwhether the upload bandwidth or the seeding time betterexplains the upload volume. The results suggest that thebandwidth is a better predictor for the upload volume at bothlevels, as opposed to the seeding time.

The study at the community level can only be done withthe bitsoup trace. At the torrent level, we investigate the s30

samples if bitsoup and alluvion. We analyze the correlationsbetween three variables: (i) the volume uploaded, normal-ized by the amount downloaded; (ii) the estimated uploadbandwidth; and (iii) the time spent online in each torrent.The normalization is used to avoid the effect of file size onthe measurements: peers are likely to leech for longer on

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-3 10-2 10-1 100 101 102 103 104

P[ba

ndw

idth

<=

x]

Peer bandwidth (KB/s)

bitsoupalluvion

ided by peers. Omitted samples behave similarly.

Page 9: Resource demand and supply in BitTorrent content-sharing

Fig. 6. Churn in bitsoup.

N. Andrade et al. / Computer Networks 53 (2009) 515–527 523

larger files and because of that, to upload more data. Also,the p-value for all results we report in this section are low-er than 2:2� 10�16.

A regression analysis shows that, at the torrent level,the amount contributed by each peer on a torrent has ahigher correlation with its bandwidth (R2 ¼ 0:50 in bitsoupand R2 ¼ 0:39 in alluvion), than to the time online(R2 ¼ 0:03 in bitsoup and R2 ¼ 0:008 in alluvion), after logtransformations on the variables.

At the community level, a similar analysis indicates aneven stronger correlation between the user’s upload vol-ume and its estimated bandwidth ðR2 ¼ 0:50Þ, comparedto the correlation between the user’s upload volume andthe time she spent online ðR2 ¼ 0:01Þ.

Therefore, the heavy contributors at the community andtorrent levels are those that own and make available morebandwidth, as opposed to those that spend longer periodsseeding. Together with our analysis of seeding time in Sec-tion 5.2, this result shows that although the main answerfrom peers to the sharing-ratio enforcement incentives istrying to contribute more by seeding for longer, the mosteffective strategy for contributing more is actually liftingbandwidth limitations, if these are used.

5.4. How stable is the population of heavy contributors?

The results in Section 5.1 show that, at the communitylevel, a few users provide most resources (20% of the usersprovide 80% of the uploaded volume). In this section, werefer to these users as heavy contributors and analyzewhether the set of heavy contributors in the communityis stable or changes over time. As it focuses on the commu-nity level, this analysis uses only the bitsoup trace.

The analysis is performed as follows: the entire trace isdivided into several non-overlapping successive time win-dows W1;W2; . . . of length t. For each window Wi, users areranked according to the total amount that they uploadedwithin that window. A user is a heavy contributor in thatwindow if it is among the top 20% contributors. To evaluatethe changes in the set of heavy contributors, we use a win-dow length of a week to account for seasonality within dai-ly and weekly periods.

The results show that 30% of the user population belongsto the set of heavy contributors at least once during the traceduration. Furthermore, approximately only 1.8% of usersmaintain the status of heavy contributors over the entiretrace duration. This shows that there is some degree ofchurn in the heavy contributors’ set. However, althoughour traces allow this assertion, their limited duration donot allow for a precise evaluation of this churn.

Nevertheless, the observation that the set of heavy con-tributors over time is not static suggests BitTorrent com-munities are more robust than communities that dependon a static set of users to provide most of its resources.Although the performance of the system does rely on asmall number of users during periods of time, the resultsindicate that some level of renewal occurs, as heavy con-tributors are replaced over extended periods. This meansfailure or departure of a heavy contributor has less impacton the system than in the case of a static set of highcontributors.

6. Relating resource demand and supply

Besides studying resource demand and supply sepa-rately, understanding their relation unveils important dataabout system functioning. In particular, this section an-swers the following questions: (i) are peers’ demand andsupply correlated? (Section 6.1); (ii) does resource supplymeet the observed demand? (Section 6.2); and (iii) is thereresource contention at the torrent level? (Section 6.3).

6.1. Are heavy contributors heavy consumers too?

Our analysis of resource supply shows that, at the com-munity level, a minority of users is responsible for most re-sources contributed. This picture naturally leads to thequestion of whether this set of users act as servers to amajority that behaves mostly as consumers. The analysisin this section shows that this is not the case in bitsoup,the community where we can evaluate user behavior. In-stead, users who are heavy contributors are also heavyconsumers.

Fig. 6 shows a scatter plot of uploaded and downloadedvolumes of users in bitsoup. The color of each point repre-sents the number of torrents a user participates: fewer tor-rents yield darker points. The logarithms of upload anddownload volumes are linearly correlated, with a Pearsoncorrelation coefficient of 0.77, revealing that users thatcontribute more are also those that consume more fromthe community. Moreover, the color gradient shows thatusers who are the heavy contributors and heavy consum-ers are those that participate in more torrents. Finally,the correlation between upload and download volumes isconsistent for peers irrespective of their activity level.

These observations portray bitsoup as an equitable shar-ing system. Our traces do not allow us to determine if thisis a result of BitTorrent’s built-in incentive mechanism orof the SRE employed in this community, but our resultsunderpin the scalability of communities adopting bitsoup’smodel. If contributions are proportional to consumption,then resource contention levels and, thus, service provisionlevels, are not affected by the scale of the community. Inthe absence of a similar correlation, growth in the popula-tion of users could lead to increasing levels of contentionfor the available resources.

Page 10: Resource demand and supply in BitTorrent content-sharing

10-4

10-3

10-2

10-1

100

0 0.2 0.4 0.6 0.8 1

P[X

< x]

st

alluvion bitsoup etree

0.85

0.9

0.95

1

allu

vion

-all

bits

oup-

all

etre

e-al

l

allu

vion

-t30

bits

oup-

t30

allu

vion

-t8

bits

oup-

t8

etre

e-t8

s c

Fig. 7. Proportion of requests served in a torrent (left) and in the community as a whole (right). Arrows indicate the 95% confidence intervals.

2 It is worth noting that removing the first month of alluvion trace, weobserve a proportion of served requests comparable to bitsoup and etree.However, further investigation on the causes of such behavior is left asfuture work.

524 N. Andrade et al. / Computer Networks 53 (2009) 515–527

From a different perspective, our results show thatthere is virtually no free-riding in bitsoup: the norm is thatusers do not consume significantly more than their fairshare of the community’s resources. This is different fromuser behavior reported in studies of Gnutella and eDonkey,where high levels of free-riding were documented[1,18,17]. Nevertheless, this comparison should be takenwith caution, as analyses of Gnutella and eDonkey arebased on the assumption that all users consume from thesystem while it is observed that only a few contribute.Our analysis provides reference results for a future and faircomparison that tracks user consumption in file-sharingnetworks and considers as free-riders only the users thatconsume more than their fair share of the system’sresources.

6.2. Does the resource supply meet the observed demand?

This section concentrates on understanding whether re-source supply is adequate to ensure system’s liveness byinvestigating the proportion of requests are successfullyserved. Overall, our evaluation shows that in all three com-munities, the great majority of requests is successfully served.

In the analyzed traces, failed download requests are ob-served when a peer is left as a leecher in a torrent and noseeder joins this torrent after that. We use the notation st

and sc for the fraction of requests that succeed in a torrentand in a community, respectively.

Fig. 7 (left) shows the CDF of st for all torrents observedin our traces. Considering only complete torrents, we ob-serve similar results across all communities: most torrentsserve virtually all the requests they receive (st is largerthan 0.99 for 97% of the torrents in bitsoup and in etree,and for 60% of the torrents in alluvion). Nevertheless, asmall fraction of the torrents have most of their requestsfailed.

However, measuring the proportion of served requestsat the torrent level does not give an accurate picture ofthe proportion of requests which are served in a BitTorrentcommunity over a period of time. In fact, a more in-depthanalysis of the failed requests reveals that the majority ofthe torrents which have a high proportion of failed re-quests are those that receive less than 20 requests. Thesetorrents are a minority and serve a small proportion ofusers.

A complementary perspective is then to analyze sc.Fig. 7 (right) shows the 95% confidence intervals of sc forthe three communities. We observe that the overall pro-portion of served requests in the three BitTorrent commu-nities studied is high: considering all requests seen, sc isnot statistically different from 1 for bitsoup and etree, andis larger than 0.98 for alluvion. Considering samples ofcomplete torrents, however, sc is significantly lower in allu-vion when compared to bitsoup and etree.

One interesting perspective is that our results do not al-low a statistical distinction between sc for bitsoup andetree. This is an indication that the sharing-ratio enforce-ment (SRE) is not necessary for etree to achieve a rate ofserved requests which is equivalent to bitsoup. On theother hand, it is not possible to infer the ineffectivenessof the SRE to improve sc , as bitsoup serves significantlymore requests than alluvion. However, the similar levelsof contributions in etree and alluvion observed in Section5 in conjunction with the high sc of etree suggest that somepeculiarity of alluvion is the cause of its lower servicerates.2 If this conjecture holds, it would imply thatalthough the SRE might lead to more seeding, it is not nec-essary for communities similar to etree or alluvion toachieve a high quality of service in the metric we consider.

The results in this section also provide insight on thequality of service provided by commons-based content dis-tribution communities like those supported by BitTorrent.These communities are loosely organized as decentralizedpeer-production systems [6] coordinated through loose so-cial relationships and still manage to serve all or nearly allrequests they receive. Our results provide quantitative evi-dence of the effectiveness of the commons-based approachas a viable alternative for its market counterparts in con-tent distribution.

Our interpretation of the service provided by BitTorrentcommunities stands in contrast with those of Guo et al.[16] and Piatek et al. [23]. Guo et al. calculated the averagevalue of st in alluvion and interpreted its value (0.9) as asign of an overall unsatisfactory quality of service intorrents. We observed that the distribution of st is

Page 11: Resource demand and supply in BitTorrent content-sharing

N. Andrade et al. / Computer Networks 53 (2009) 515–527 525

considerably skewed, which renders its mean a limitedassessment of typical torrent behavior. Indeed, 60% of tor-rents in alluvion serve more than 99% of their requests. Fur-thermore, our results show that service quality in alluvioncannot be taken as the general quality of service in BitTor-rent communities.

Piatek et al. reported that BitTorrent provides a poorservice to its users because 25% of 55,000 torrents ob-served during their measurements were unavailable. Thedifference between Piatek et al.’s conclusion and ours isdue to different definitions of availability: we account ser-vice unavailability only when a peer tries to download afile and fails, while Piatek et al. do not relate availabilityand demand.

6.3. Is there resource contention at the torrent level?

We now turn to investigate resource contention at thetorrent level, examining what is the typical regime of oper-ation in BitTorrent communities. Resource contention ex-poses mismatches between resource demand and supplywhich affect the functioning of the system. When resourcedemand is much larger than supply in a torrent, downloadperformance falls short of what consumers’ downloadbandwidth allows for. If supply is much larger than de-mand, providers’ resources are underutilized. Also, de-mand and supply play a role in BitTorrent incentives:prioritization is only relevant when resources are scarceand cannot serve the demand of all consumers.

In summary, our analysis finds that resource contentionis similar across communities: most torrents operate undersome resource contention, while one quarter of torrents hasno contention.

For this investigation, we assume that BitTorrent’s tit-for-tat mechanism works efficiently and leechers that pro-vide more bandwidth are prioritized in the torrent (pleaserefer to Legout et al. [20] for experimental evidence). Thisimplies that when there is resource contention in a torrent,there is a positive correlation between the upload anddownload speed of leechers. To test the existence of re-source contention we thus use the Kendall correlationcoefficient to measure the degree of correlation betweenthe rankings of upload and download speed of leechers.The strength of this correlation is directly related to the le-vel of resource contention in the torrent.

0 0.2 0.4 0.6 0.8

1

-0.3 0 0.3 0.6 0.9

P[X

< x]

Corr. between up and download bandwidth of peers in torrent

alluvionbitsoup

Fig. 8. CDF of the Kendall correlation coefficient between upload anddownload bandwidths of the peers in each torrent in tau30.

Fig. 8 presents the Kendall’s correlation coefficient be-tween upload and download speeds of peers in all torrentsof bitsoup and alluvion that had at least five peers. The dis-tribution of how these measures are correlated in the twocommunities is very similar. For most torrents, there is atleast a mild (0.3) correlation between download and up-load speeds, indicating contention, but only for a smallproportion of torrents the correlation is strong (P0.6).Moreover, in one fourth of the torrents there is enoughbandwidth for all peers to receive the service they demandirrespective of their contributions.

The absence of resource contention in one fourth of alltorrents is particularly relevant to the design of incentivemechanisms, as mechanisms based on prioritization arerendered irrelevant for these torrents. On the other hand,the existence of some degree of correlation between up-load and download speeds in most torrents suggests thatcontribution levels are not sufficient to meet the entire de-mand in the majority of torrents. Assuming most BitTor-rent users have asymmetric Internet connections,leechers’ demand can only be met if there are seeders ina torrent. Therefore, our results suggest that (i) seeder ser-vice is not enough to compensate the asymmetry in theInternet connections of leechers; and (ii) communitiescould benefit from higher levels of contribution of uploadbandwidth or seeding time.

We also note that the levels of resource contention donot change significantly across communities. This is evi-dence that the higher seeding times seen in bitsoup donot dramatically change the relation between demandand supply in this community.

Lastly, the range of resource contention levels found inbitsoup and alluvion agree with the observation by Locheret al. [22] that sometimes download speed in a torrent isrelated to upload bandwidth contributed, while sometimesit is not. Our data, however, quantifies this phenomenon.Izal et al. [19] observed a positive correlation between lee-cher upload and download speeds in a highly popular tor-rent. Our observations indicate how this correlation variesin a large, heterogeneous communities.

7. Final remarks

The characterization of a computational system mustconsider four aspects: the system’s design, its implementa-tion, the resources on which it runs, and the workload itserves. This work focuses on the latter two aspects in thecontext of BitTorrent commons-based content-sharingcommunities. In particular, it characterizes resource de-mand, resource supply and their relationship in three Bit-Torrent communities. Our results have broad impact onthe design of BitTorrent extensions, on the design of com-plementary mechanisms and on the study of this system.

The results related to resource demand (i) point to thedesign of cache mechanisms for BitTorrent that leveragethe peculiar popularity distributions identified in thisstudy; and (ii) provide an accurate model for reasoningabout and synthesizing the popularity of torrents overtime. In particular, the results strongly suggests that futureresearchers should consider the model introduced in this

Page 12: Resource demand and supply in BitTorrent content-sharing

526 N. Andrade et al. / Computer Networks 53 (2009) 515–527

study, as the commonly referred to Poisson process is notaccurate for current BitTorrent usage.

The characterization of resource supply motivates com-munities to identify and nurture their heavy contributorsand community resource allocation mechanisms to bedeveloped based on simple information. The investigationalso identifies that users that contribute the most in thesystem are those that provide more bandwidth and revealssome redundancy in the set of users that provide most ofthe resources in the community, providing insight on therobustness of these communities.

The analysis of the relation between resource demandand supply provides a novel picture of content-sharingvia BitTorrent, where users that provide most of the re-sources are not altruistic. Instead, they generate a propor-tional demand. Our results also quantify one dimension ofthe quality of service achieved by BitTorrent communitiesand suggest that service should be improved mostly onsmall torrents. Additionally, the investigation of resourcecontention presents the typical regime of operation ofthe system with respect to the level of contention in thesystem, which is a valuable information for designers of re-source allocation mechanisms.

Finally, this study contributes to the methodology forexperimental studies of BitTorrent content-sharing com-munities, summing up good practices relevant for BitTor-rent data analysis to assess information loss due tosampling and to avoid biased estimations.

Our results suggest a number of avenues for futurework. Besides the caching and resource allocation investi-gations mentioned above, our results motivate furtherstudy of the different characteristics of torrent popularitythat result in the fast drop of peer joins over time; simula-tion studies that consider the effect of the long tail of peerjoin rates; and a further investigation of the factors thatinfluence the request success rate in differentcommunities.

Moreover, future work should extend the breadth ofthis characterization. Although our characterization usedtraces of up to 68 days and three communities, it providesa limited view of current BitTorrent usage. In particular, fu-ture studies should focus on torrents that survive over ex-tended periods of time, use accurate user identification andconsider other metrics to gauge the quality of service pro-vided in similar communities. Characterizing more com-munities of different sizes and potentially different userhabits is also still necessary if we are to better understandhow the human factor drives content distribution on theInternet and to design mechanisms that better servehumans.

Acknowledgements

The authors would like to thank the Umass Trace repos-itory for making the alluvion trace available and JaindsonSantana and Flavio Santos for valuable help in processingthe traces. Francisco Brasileiro thanks the support receivedfrom CNPq/Brazil (grant 309033/2007-1). Elizeu Santos-Neto is partially supported by the British Columbia Innova-tion Council/Canada.

References

[1] E. Adar, B.A. Huberman, Free riding on Gnutella, First Monday 5(2000) 10.

[2] K. Anagnostakis, F. Harmantzis, S. Ioannidis, M. Zghaibeh, On theimpact of practical P2P incentive mechanisms on user behavior,Working Paper 06-14, NET Institute, 2006.

[3] N. Andrade, M. Mowbray, A. Lima, G. Wagner, M. Ripeanu, Influenceson cooperation in BitTorrent communities, in: Proceeding of the2005 ACM SIGCOMM Workshop on Economics of Peer-to-PeerSystems, New York, NY, USA, 2005, pp. 111–115.

[4] N. Andrade, E. Santos-Neto, F. Brasileiro, M. Ripeanu, Methodologicalnotes on studying BitTorrent through tracker snapshots, TechnicalReport, Networked Systems Laboratory, 2008. <http://netsyslab.ece.ubc.ca>.

[5] A. Bellissimo, B.N. Levine, P. Shenoy, Exploring the use of BitTorrentas the basis for a large trace repository, Technical Report 04-41,University of Massachusetts, 2004.

[6] Y. Benkler, Sharing nicely: on shareable goods and the emergence ofsharing as a modality of economic production, The Yale Law Journal114 (2004) 273–358.

[7] A. Bharambe, C. Herley, V. Padmanabhan, Analyzing and improving aBitTorrent network’s performance mechanisms, in: Proceedings ofthe INFOCOMM, 2006.

[8] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata,A. Tomkins, J. Wiener, Graph structure in the web, ComputerNetworks 33 (1) (2000) 309–320.

[9] K. Burnham, D. Anderson, Multimodel inference: understanding AICand BIC in model selection, Sociological Methods and Research 33 (2)(2004) 261–304.

[10] CacheLogic, P2p in 2005, 2005. <http://www.cachelogic.com/home/pages/research/p2p2005.php>.

[11] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, S. Moon, I tube, you tube,everybody tubes: analyzing the world’s largest user generatedcontent video system, in: Proceedings of the Seventh ACMSIGCOMM Conference on Internet Measurement, San Diego, CA,USA, ACM, 2007, pp. 1–14.

[12] A.L. Chow, L. Golubchik, V. Misra, Improving BitTorrent: a simpleapproach, in: Proceedings of the IPTPS, 2008.

[13] B. Cohen, Incentives build robustness in BitTorrent, in: Proceedingsof the Workshop on Economics of Peer-to-Peer Systems, Berkeley,CA, USA, June 2003.

[14] S. Glassman, A caching relay for the world wide web, ComputerNetworks and ISDN Systems 27 (2) (1994) 165–173.

[15] K.P. Gummadi, R.J. Dunn, S. Saroiu, S.D. Gribble, H.M. Levy, J.Zahorjan, Measurement, modeling, and analysis of a peer-to-peerfile-sharing workload, in: SOSP’03: Nineteenth ACM Symposium onOperating Systems Principles, 2003, pp. 314–329.

[16] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, X. Zhang, Measurements,analysis, and modeling of BitTorrent-like systems, in: Proceedings ofthe ACM SIGCOMM/USENIX IMC, October 2005, pp. 19–21.

[17] S.B. Handurukande, A.-M. Kermarrec, F.L. Fessant, L. Massoulié, S.Patarin, Peer sharing behaviour in the eDonkey network, andimplications for the design of server-less file sharing systems, in:Proceedings of the EuroSys’06, New York, NY, 2006, pp. 359–371.

[18] D. Hughes, G. Coulson, J. Walkerdine, Freeriding on Gnutellarevisited: the Bell Tolls, IEEE Distributed Systems Online 6 (2005) 6.

[19] M. Izal, G. Urvoy-Keller, E.W. Biersack, P. Felber, A.A. Hamra, L.Garcés-Erice, Dissecting BitTorrent: five months in a Torrent’slifetime, in: Proceedings of the Passive and Active Measurements,Antibes Juan-les-Pins, France, April 2004.

[20] A. Legout, N. Liogkas, E. Kohler, L. Zhang, Clustering and sharingincentives in BitTorrent systems, in: Proceedings of SIGMETRICS,2007, pp. 301–312.

[21] N. Liogkas, R. Nelson, E. Kohler, L. Zhang, Exploiting BitTorrent forfun (but not profit), in: Proceedings of the IPTPS, February 2006.

[22] T. Locher, P. Moor, S. Schmid, R. Wattenhofer, Free riding inBitTorrent is cheap, in: Proceedings of the HotNets.

[23] M. Piatek, T. Isdal, A. Krishnamurthy, T. Anderson, One hopreputations for peer to peer file sharing workloads, in: Proceedingsof the NSDI, 2008.

[24] F.L. Piccolo, G. Neglia, G. Bianchi, The effect of heterogeneous linkcapacities in BitTorrent-like file sharing systems, in: Proceedings ofthe Hot-P2p, Los Alamitos, CA, USA, IEEE Computer Society, 2004, pp.40–47.

[25] J.A. Powelse, P. Garbacki, D.H.J. Epema, H.J. Sips, Measurement studyof the BitTorrent peer-to-peer file-sharing system, Technical ReportPDS-2004-003, Delft U. Technology, 2004.

Page 13: Resource demand and supply in BitTorrent content-sharing

N. Andrade et al. / Computer Netw

[26] D. Qiu, R. Srikant, Modeling and performance analysis of BitTorrent-like peer-to-peer networks, in: Proceedings of the SIGCOMM, August2004, pp. 367–378.

[27] M. Ripeanu, M. Mowbray, N. Andrade, A. Lima, Gifting technologies:a BitTorrent case study, First Monday 11 (2006) 11.

[28] D. Roselli, J.R. Lorch, T.E. Anderson, A comparison of file systemworkloads, in: Proceedings of the USENIX Annual TechnicalConference, Berkeley, CA, USA, 2000, p. 4.

[29] O. Saleh, M. Hefeeda, Modeling and caching of peer-to-peer traffic,in: Proceedings of the ICNP, 2006, pp. 249–258.

[30] S. Sinha, R.K. Pan, Econophysics and Sociophysics: Trends andPerspectives, Wiley–VCH, 2006, pp. 417–447 (ch. How a hit isborn: the emergence of popularity from the dynamics of collectivechoice).

[31] D. Stutzbach, R. Rejaie, Understanding churn in peer-to-peernetworks, in: IMC’06: Proceedings of the Sixth ACM SIGCOMM onInternet Measurement, New York, NY, USA, ACM Press, 2006, pp.189–202.

[32] D. Stutzbach, D. Zappala, R. Rejaie, The scalability of swarming peer-to-peer content delivery, in: Proceedings of the NETWORKING, 2005,pp. 15–26.

[33] A. Wierzbicki, N. Leibowitz, M. Ripeanu, R. Wozniak, v. . . Cachereplacement policies for peer-to-peer file-sharing protocols,European Transactions on Telecommunications 15 (2004) 6.

[34] D.M. Wilkinson, Strong regularities in online peer production, in:EC’08: Proceedings of the Ninth ACM Conference on ElectronicCommerce, New York, NY, USA, ACM, 2008, pp. 302–309.

[35] X. Yang, G. de Veciana, Service capacity of peer to peer networks, in:Proceedings of the INFOCOMM, 2004.

Nazareno Andrade is a Ph.D. student at theUniversidade Federal de Campina Grande,Brazil. He received a B.Tech. degree in Tele-matics from the Centro Federal de EducaçãoTecnológica da Paraíba, Brazil, in 2001 and anM.Sc. in Informatics from the UniversidadeFederal de Campina Grande, Brazil, in 2003.His research interests include peer-to-peersystems, grid computing and sharing.

Elizeu Santos-Neto received a B.S. and M.S. in

Computer Science from the UniversidadeFederal de Alagoas and the UniversidadeFederal de Campina Grande, respectively. Healso worked as an Assistant Researcher at theOurGrid project (http://www.ourgrid.org) andcollaborated to the Virtual Workspaces pro-ject (http://workspace.globus.org), where heinvestigated topics in Grid Computing focusedon distributed scheduling, resource allocationand virtualization technologies. Currently, heis a Ph.D. candidate at the University of British

Columbia. His research interests are related to the characterization andmechanism design for large-scale distributed systems.

Francisco Brasileiro is a Professor at theUniversidade Federal de Campina Grande,Brazil. He received a B.S. degree in ComputerScience from the Universidade Federal daParaíba, Brazil, in 1988, an M.Sc. degree fromthe same University in 1989, and a Ph.D.degree in Computer Science from the Uni-versity of Newcastle upon Tyne, UK, in 1995.His research interests include dependabilityin distributed systems, grid computing anddistributed algorithms and protocols. He is amember of the Brazilian Computer Society,

the ACM, and the IEEE Computer Society.

orks 53 (2009) 515–527 527

Matei Ripeanu received his Ph.D. degree inComputer Science from The University ofChicago in 2005. After a brief visiting periodwith Argonne National Laboratory, he joinedthe Electrical and Computer EngineeringDepartment of the University of BritishColumbia as an Assistant Professor. He isbroadly interested in distributed systems witha focus on self-organization and decentralizedcontrol in large-scale grid and peer-to-peersystems. He has published in major academicconferences on large-scale grid and peer-to-

peer system characterization, on techniques to exploit the emergentcharacteristics of these systems, and on supporting scientific applicationsto run on these platforms.