Live Streaming over P2PSIP

LiSP: A layered P2PSIP-based architecture for live video streaming with flexibleapplication logic placement

Victor Pascual, Carlos MacianNetworking Technology and Strategies Research Group (NeTS)

Universitat Pompeu FabraPasseig de Circumval.lacio, 3 08003 Barcelona (Spain)

{victor.pascuala, carlos.macian}@upf.edu

Abstract

Internet live video streaming is an blooming technology.To accomodate bandwidth consumption, several architec-tures based on P2P principles exist. An important draw-back is their incompatibility with the standard in multime-dia transport control, SIP, mainly due to its client-server na-ture. Recently the IETF has started the design of a P2P ver-sion of the SIP protocol, P2PSIP. In this paper, a live videostreaming architecture based on P2PSIP and standard SIPis presented. The architecture is divided in three layers(Users, Peers and Applications) and can be integrated withany other SIP-based service. Furthermore, the applicationlogic can be distributed across the layers. This permitsan implementation in which the operator provides and billsfor the service, or a more endpoint-centric implementation.Both cases are presented here, together with an estimationof the system’s complexity and the signalling load involved.

1. Introduction

In spite of the growing salience of computer games, webbrowsing and other network-centric and PC-centric expe-riences, TV is still the most prominent element in house-hold leisure. In recent years, however, video streaming overIP networks has emerged as a technologically feasible, al-beit still immature, contender for classical TV broadcast-ing. This advance is of tremendous commercial relevance,for it opens the door for true multimedia integration in thehome around the PC (or other networked appliances sim-ilar in their capabilities) and the displacement of classicalTV broadcasting as a second-best experience. For networkoperators and service providers alike, eager to augment thevalue added of their networks and networked products, thisis a huge opportunity.

For technological as well as commercial reasons, then,video streaming has been a very active research topic as oflately. Specially live streaming, with its additional real-timeconstraints has been a very challenging topic, for a numberof reasons.

First, video streaming, if it is to compete with classi-cal TV broadcasting, has to scale to very large audiences.In spite of tremendous advances in video compression andcoding, an average quality, fullscreen video stream needsanwhere from 512 Kbps to 2 Mbps as a minimum; muchmore if it is HDTV quality. Given the lack of support fornative multicast in IP networks, and particularly in the Inter-net, bandwidth consumption at the source for unicast distri-bution becomes prohibitive. Second, managing large num-bers of dynamically joining and leaving customers, spe-cially if some form of video relaying or forwarding forbandwidth saving is employed, implies a tremendous man-agement burden and potentially impacts severly the qualityof experience. Although buffering can alleviate the prob-lem, its use in live streaming is necessarily very restricted.Third, as a side effect of the impact of churn, zapping(which implies unsubscribing from a certain channel andsubscribing to another one) is very slow. The managementburden associated with it and the initialization of the sys-tem for the new channel all bring zapping speed down toanywhere from several seconds to several tens of seconds.Last but not least, the perceived quality of video streamingover the Internet is still poor, with chunky video and evenloss of continuity. Although this might be a worthy price topay for free TV, especially when accessing content unavail-able otherwise, it is certainly not ripe for mass commercialdeployment.

Peer-to-peer technology has been proposed to solvemany of those problems. By using the video consumersas also relay elements, the bandwidth consumption at thesource can be drastically reduced and the scalability of thearchitecture improved: With every new viewer contributing

its upload bandwidth to the overall system, the capacity ofthe network actually grows with the number of users, whichis a requirement for large audiences. Besides, users can besomewhat protected from the effect of churn, provided thatevery viewer can contact more than one relaying station si-multaneously. Should one of the stations change to anotherchannel, the total download bandwidth will decrease mo-mentarily, but not be cut out altogether.

However, P2P does not solve the issue of the long startupand zapping delays, and can actually even make it worse.Since the stream now comes to a viewer after traversing anumber of relaying stations, additional delay is introduced.Furthermore, a number of additional problems appear in theP2P context. Assuming that a number of relay stations exist,an algorithm is needed to decide from which of them to (si-multaneously and cooperatively) download the video, andalso which parts of the video to download from every oneof them. The reverse is also true: Since upload bandwidthis typically scarce in ADSL environments, an algorithm isalso needed to decide to whom to stream the solicited videopieces, if more requests arrive than can be accepted. Allin all, P2P live streaming solutions generally also presenthigh management complexity and burden in order to keepthe information about sources, channels, videos, relays, etc.updated and distributed among the participating nodes.

Nevertheless, P2P technology is a very promising av-enue, which is producing a number of breakthroughs in livestreaming. But to date, all the proposed architectures havefailed to address the issue of true integration with the defacto standard protocol in multimedia transmission in theInternet: SIP[15]. To the best of our knowledge, all P2P livestreaming architectures to date resort to proprietary proto-cols for overlay management and in some cases, also forthe management of video sessions. We think that it is a badidea for two reasons: First, video streaming in the Inter-net based on standard multimedia distribution and manage-ment protocols opens the door to a new dimension of con-verged multimedia services. Although many platforms existfor concurrent text and audio interchange, like Microsoft’sMessenger or Skype, which even support videoconferenc-ing, they remain isolated communication devices. Theylack so far the integration with other web-based services andways of expression like blogs, social communities, etc. andalso with non-interactive media distribution, like, precisely,video broadcasting, Internet radio, etc. Since most audioand chat communication platforms rely on SIP, it would beincredibly advantageous to integrate video streaming on thesame set of protocols.

It can be argued that SIP follows the client/serverparadigm and hence can not be integrated with P2P tech-nologies in video streaming solutions. Besides, SIP onlydeals with session negotiation and not with seed localiza-tion, channel information distribution, etc. But the second

reason why it seems a bad idea to resort to proprietary pro-tocols for video streaming is the advances being made in theIETF P2PSIP WG in designing a fully compatible, P2P ver-sion of SIP. Although the protocols themselves are still inthe making, the principles and main characteristics of themare already known: P2PSIP will make use of DHTs for re-source storage and localization, beginning with Chord butremaining open to other implementations. The DHT willstore location and identification information for members ofa certain community (like, e.g., viewers of a video channel),as well as information concerning their capabilities and sta-tus. They will have built-in NAT traversal capabilities andservice discovery mechanisms. A number of drafts havebeen published based on [1], where the details of the forth-coming protocol can already be found. Hence, it is possibleto use those drafts, together with the original SIP protocol,as a basis to develop a fully IETF compliant architecture forlive video streaming around SIP.

In this paper, LiSP (Live Streaming over P2PSIP), anovel video live streaming architecture built around the lat-est developments in the P2PSIP WG and SIP, is presented.It is composed of an structured overlay (i.e., based on aDHT) for user and video management and a partial meshof video sessions for data delivery, in which all viewers actalso as media relays. The P2PSIP Peer protocol is used forthe former and SIP for the later. The emphasis of the pa-per lies in a detailed description of the protocols involved,together with a complexity analysis of the overall architec-ture. Furthermore, the architecture can be adapted to dif-ferent functional distributions at the service level, withoutchanging its basic structure. Here, we present two extremeexamples: An endpoint-centric approach, in which all ap-plication logic resides at the end nodes themselves, andthe overlay provides only a very basic support to the videostreaming application. This is the ”classic” P2P approach.A second, network-centric approach, in which the overlaytakes a dominant role in the application is also introduced.This mode of operation is specially appealing for networkoperators eager to increase the value of their networks byproviding a link to the final applications. However, in thispaper only the first scenario will be presented in detail.

The rest of the paper is structured as follows: Section 2reviews recent advances in the field of live streaming overIP networks. Section 3 presents the main elements of ourarchitecture. Section 4 describes the different scenarios rel-ative to functional distribution that are possible. Section 5focuses on the endpoint-centric scenario dealt with in thisarticle, which is evaluated in Section 6. Finally, section 7summarizes the findings of the article and gives some hintsfor further work.

2. Related Work

Live video streaming based on P2P presents a numberof challenges. Some of them are general to broadcastinga broadband video signal to a large audience [19], whileothers are specific to live streaming [11]. In general, mostinitiatives so far try to leverage P2P technology with thegoal of making the system more scalable by reducing band-width consumption at the source, by relaying the content tosuccessive watchers. The first such works used application-level multicast for data transmission [6], [8]. Multicast,however, is based on building trees to distribute data froma source to many destinations. A tree is inherently fragile,for the loss of an interior node impacts on all subsequentsubtrees. To increase system robustness in the presence ofchurn, multi-trees have also been explored, so that leaves inone node are also interior nodes in different trees [4]. Theinformation is divided into different layers, and every layeris sent through a different tree. Only by subscribing to alltrees can the original signal be fully recovered, but the lossof a layer only degrades the quality, without impeding thevisioning. Together with Forward Error Correction mech-anisms [16], streaming becomes much more robust in theface of errors and churn. Alternatively, additional branchescan be introduced in the tree in order to augment its robust-ness [10]. Other proposals eliminate trees altogether andbuild partial or full meshes for data distribution [13] [12][14], [20]. These proposals typically divide the informationin small pieces called chunks, so that parallel downloadingfrom multiple sources can be scheduled.

A related question is how to manage the distribution ofinformation about membership, video availability, relaying,etc. This overlay management is typically done in one oftwo ways: Either with the help of a DHT (the structuredapproach), or by relying on gossip-based protocols (the un-structured approach) [14], [5]. While DHTs provide higherrouting efficiency, they are also more complex.

By introducing multiple layers and/or chunks, and byturning intermediate viewers into relaying stations, a livestreaming design must find an answer to a number of newquestions: How to choose from whom to download? Howto choose what to download from each chosen relay? Howto choose to whom to relay? These questions require thedesign of appropriate selection and scheduling algorithms[17], [14], [20].

In spite of the diversity of approaches taken, they allhave something in common: The use of proprietary pro-tocols for both the overlay management and the data distri-bution. Specifically, none of the referenced works uses SIPfor session management. [18] and [2], on the other hand, douse SIP as the central protocol in their designs. However,their approach seems questionable in at least two respects,since they do not only use SIP for end-to-end data session

management, but also for overlay management: Since SIPwas neither developed for intra-DHT communication, norfor overlay management, they have to severely twist the SIPprocedures and the structure and meaning of its messages,substantially altering its spirit. Clearly, for overlay manage-ment a protocol compatible with SIP is necessary, but usingSIP itself seems odd. The IETF P2PSIP WG is providingan answer to this.

In our design, we follow the P2PSIP WG use of the struc-tured approach for overlay management, combined with apartial mesh for data distribution. We rely on existing algo-rithms for chunk scheduling and relay and request selection.The introduction of layering is fully compatible with ourarchitecture, but it is left for further study. Table 1 summa-rizes the main architectural properties of our design, com-pared with the works of our predecessors.

Architecture Overlay Manage-ment Structure

Data DistributionMechanism

Protocol

AnySee Unstructured Multitree ProprietaryCoolStreaming Unstructured Mesh ProprietaryPULSE Unstructured Mesh ProprietaryChainsaw Unstructured Multicast ProprietarySplitStream Structured Multitree ProprietaryMPSS Structured Multitree SIPSOSIMPLE Structured Multitree SIPLiSP Structured Mesh P2PSIP Peer

Protocol + SIP

Figure 1. Main architectural properties of rel-evant works

3. Architecture

The main emphasis of this work lies in the use of P2Pprinciples for the transmission of live video streams over theInternet, with the clear goal of maximizing the scalabilityof the platform in terms of users and channels, while mini-mizing the bandwidth usage (from the point of view of theemitting source). The particular constraints of live broad-casting impose also a strong interest in advanced video cod-ing and chunk selection techniques that minimize delay andprovide enhanced robustness against packet loss and nodechurn. LiSP is based on the joint use of Session InitiationProtocol (SIP) by the viewers and Service Extensible Proto-col (SEP)[9] inside of the operator’s network, serving henceas a use case description for the P2PSIP peer protocol pro-posed to the IETF P2PSIP WG. Since the basic propertiesof the SEP draft are similar to the other proposals beingdiscussed, our architecture and the subsequent analysis andconclusions can safely be generalized to them, too.

The architecture presented here supposes a core networkformed by peer nodes at the service level. Peers participatein overlay network, they are overlay-routing nodes, and atthe same time a peer contributes its storage capacity to any

other peer in the overlay network. Peers must support atleast the overlay maintenance, routing, and storage func-tions. These nodes are responsible for storing and manag-ing information about the channels being emitted as well asthe nodes currently (re-)transmitting them. Other importantinformation will also be stored in the core network depend-ing on the application scenario, as will be explained shortly.The peer nodes communicate among them by using the SEPpeer protocol, which conforms to the definition of P2PSIPPeer protocol in [1].

Like the P2PSIP Peer Protocol definition, SEP is basedon a DHT algorithm and not only maintains the overlaytopology, but also provides distributed database service.SEP uses a flexible packet forwarding mechanism so thatpeers could choose the best peer to route the packet further.It also provides a common method for service discovery,i.e. to discover which peers could provide a specific ser-vice. Some of these additional services may be requiredto allow the overlay to form and operate, while others maybe enhancements to the basic P2PSIP functionality[1]. Therouting modes taken by the SEP attempts to make the trans-action with lower latency and higher success rate even if theintermediate peers fail or NATs are between the source andthe destination peers.

In short, overlay peers form a ring of nodes, synchroniz-ing their information based on SEP and a DHT algorithm.However, every peer shall retrieve and keep information lo-cally relevant to users connected to it.

LiSP users are non-overlay nodes which implement aSIP client (UA). They use SIP as defined in the RFC3261and its associated extensions and are not aware if behindtheir responsible peer (named SIPeer since it acts as a SIPServer), lookup operations are performed using a P2P ora C/S topology. A LiSP user can adopt three differentand non-exclusive roles: Consumer, Seed and Media Re-lay. Consumers are the viewers of the video channels. Theyexpress their desire to watch a channel to their responsi-ble SIPeer, which will communicate the information aboutwhich nodes are currently (re-)transmitting the channel tothe consumer. The consumer will then be in a position toconnect to one or more of those relays to download thevideo data. Media Relays, on their side, are users whichmay be acting as consumers but are also relaying the videosession to other consumers. Seed nodes are the original andunique media sources, which may be a television camera, avideo server or even a webcam which presents SIP capabil-ities.

As said, LiSP users use a standard SIP interface to com-municate with their SIPeer, creating hence a structured,two-layered architecture, in which consumers use SIP tocommunicate with the overlay to locate nodes transmittingtheir desired content, and also among themselves for thepurpose of session establishment, while the peers use SEP

among themselves for overlay management and channel in-formation distribution and retrieval.

Figure 2. LiSP layered architecture

However, the amount of information stored and pro-cessed by the overlay versus the users varies, dependingon the functional distribution among peers and consumers.Quite obviously, this distribution presents a number oftrade-offs in terms of processing, storage and signaling loadfor both kinds of nodes. Different arrangements may be de-sirable, depending on the application scenario, number ofparticipating nodes (both consumers and peers), node capa-bilities, etc. Three different arrangements have been iden-tified: One in which consumers are responsible for mostof the processing, while the overlay provides only a loca-tion service (named Consumer does most), and another onein which the overlay is charged with most of the manage-ment of the video streaming platform (named Overlay doesmost). In between, a third hybrid scenario tries to balancethe load supported by both kinds of nodes. In this work,only the first arrangement is described in detail, leaving theothers for further work.

4. Scenarios

4.1. Consumer does most

In the Consumer does most scenario, the overlay is re-sponsible only of storing and managing information relatedto the location of the participating peers and about the list ofavailable channels in the platform, as well as which nodesare relaying their content. Hence, most of the tasks associ-ated with live streaming, like the decision processes aboutwhich chunks to download, and from whom, reside in the

consumers themselves. This architecture presents, as a con-sequence, a lightly loaded overlay, both in terms of sig-nalling and processing. The consumers, on the other hand,must have an important processing capacity and participateheavily in the signalling process. As stated, the peer areorganized according to the SEP protocol using a underly-ing DHT based on Chord. Hence, for communication pur-poses they form a logical ring. On the one hand, one of themain advantages of the Consumer does most scenario’s ar-chitecture is that the main logic of the application (whichchunks to retrieve, from whom, etc.) resides at the con-sumer. This allows for seamless application upgrades, in-cluding new source coding methods, etc. As long as thechannels are registered in the overlay, the network can con-tinue to operate. It is also a benefit the fact that the overlayprovides a support to the application by locating nodes andkeeping channel and relaying node lists, but does not par-ticipate in the application itself and thus, since most of thetasks are performed directly by the consumers and most ofthe signalling travels end to end, the overlay is very scalablewith the number of viewers and channels. On the other handthere exist some disadvantages. Since most of the tasksare performed directly by the consumers it also means highresource consumption at them. While this might not be aproblem for desktop PCs, mobile users with less powerfuldevices (PDAs, laptops, etc.) might suffer from excessiveprocessing burden, bandwidth waste through signalling and,worst of all, energy consumption. Platform control may alsobe considered to become an issue; since the overlay onlykeeps information about relaying nodes and channels beingemitted, the possibilities for performing adequate account-ing and billing, would it be the desire of the overlay to doso (typical in case that it would be controlled by a networkoperator), would be greatly reduced.

4.2. Overlay does most

In the Overlay does most scenario, the node roles aresomewhat reversed. Should a network operator decide tosupport or (through partnership, direct provision or anyother arrangement) directly provide the video streaming ser-vice itself on top of its network, it shall very probably havea strong interest in controlling much more tightly the infor-mation interchange across its network, for billing, account-ing and security purposes. Operators are very jealous ofthe reliability of their networks, which has reached unprece-dented levels for other forms of data networking. Hence, itis vital for this scenario that the overlay is aware of, andfurthermore can control, the whole signalling and data in-terchange. Furthermore, the overlay concentrates the wholeinformation regarding the state of the participating nodes’Buffer Maps (albeit in a distributed manner, since the infor-mation for every channel and relay/viewer is actually stored

only at the corresponding node for every seed/relay). Theoverlay will perform all the processing functions previouslydone at the consumer in a distributed manner: It will decidewhich nodes shall be chosen as relays for every new viewer,it will schedule the transmission of chunks and it will keepupdated the list of channels and viewers. Hence the de-nomination Overlay does most (of the work). By keepingthe state relative to all broadcasted media, the overlay takesover a burden which reduces its scalability, due to the in-crease in signalling. It must be highlighted, however, thatevery peer only keeps track of channels being viewed byat least a consumer for which it plays the role of corre-sponding node. It also keeps the Buffer Map of every ac-tive consumer connected to it. In this way, the overlay is ina position to perform the scheduling of chunks and nodesmentioned above, and even to do so potentially in a moreefficient way than every consumer on its own, since it hasan overall view of the resources in use, how many connec-tions is carrying every relay, etc. If the consumers regis-tration message would also carry information regarding theconsumers’ capabilities, such as processing power, storagecapacity and link bandwidth, it would be theoretically pos-sible (albeit mathematically very complex, if not impossiblein real time) to compute an optimum distribution of con-nections to every relay. However, the possibility of usingheuristics is worth of further exploration for certain scenar-ios. The architecture presented here has a number of advan-tages; most of them from the point of view of the operator.Since all decision algorithms are in the hands of the overlay,there exists a centralized control for better billing capac-ity, free-riding surveillance and resource management. Thisstructure permits to have light-consumers, devices which,due to the low processing power and signalling load thatthey will need, can respect the tight battery and bandwidthconstraints of today’s mobile devices. Since there are noperfect solutions, a number of drawbacks also apply. Themain issue concerns the platform scalability. Since the sig-nalling burden at the overlay increases considerably, a studyof scalability would be necessary to evaluate how many andhow powerful should the peers be to accommodate this load.

4.3. Hybrid Approach

In between the extremes, a number of hybrid approachesare possible, in which the load is distributed to differentdegrees between the two kinds of nodes, users and peers.Furthermore, the goal of such a hybrid approach may notbe to reduce the load at one or the other kind of node, butto distribute the knowledge about the state of the networkand its resources, depending on the scenario. For exam-ple, in a hybrid case, an overlay would still receive all theBuffer Maps, which it would forward to the viewers uponrequest. However, the selection of which relays to contact

and which chunks to request, arguably the central elementof the live streaming architecture, would reside at the end-points. This example highlights the load distribution be-tween consumers and peers. Further equilibrium points ex-ist. As was stated before, if the consumers communicatetheir capabilities upon registration, the signalling can be un-evenly distributed among them: The overlay could take upmore of the signalling for battery-constrained or simpler de-vices, impersonating the consumer to a large extent, whileother, more powerful consumers could assume a more in-tense role in the signalling process. Further examples wouldbe if the operator would not be as much interested in par-ticipating in the signalling, as in being able to perform anaccurate accounting (and posterior billing). In that case, theinformation about the channels being viewed and the dura-tion of the optimal media data assignment sessions mightsuffice (the extreme Consumer does most case), but maybethe operator also wants to be informed about the amountof data received, if the billing is dependent upon that. Inthat case, the operator would also be interested in collect-ing the different Buffer Maps. It is up to the operator toselect which degree of control it wants to have over theircustomer’s service consumption.

5. Consumer does most scenario description

In this scenario, peers are organized according to the SEPprotocol, and hence, for communication purposes they forman overlay which makes use of a Distributed Hash Table(DHT) for node and resource location. Chord is one of themost popular DHT algorithms for its robustness in handlingchurn. Chord is based on a ring logical topology wherelookup is done in O(LogN) number of messages.

When started up, a node needs to either join the existingoverlay or create a new overlay. In order to join an existingoverlay, the node must first locate some peer that is alreadyparticipating the overlay. This is common to any layered ar-chitecture and is known as the bootstrap node location prob-lem. A number of possibilities exist: cached or well-knownbootstrap peer addresses, broadcast bootstrap peer discov-ery, manual bootstrap peer address configuration, etc. Forthe purposes of the present discussion, any such mechanismwould work equally well.

After joining the overlay, a node is able to search otherpeers and resources and share its own resources with theother peers. The Overlay uses the P2PSIP Peer Protocolfor enforcing these operations. SEP is one of the proposedP2PSIP Peer protocols. Since the bootstrapping and the ini-tialization of the overlay are out of the scope of this docu-ment, let’s assume Peers 1, 2, 3 and SIPeers A, B, C and Shave already set up an Overlay.

LiSP users, denoted as SIP UAs and represented as Seed,Anna, Boris and Carlos, must first of all register with the

overlay to make their presence known. Remember that theuser-SIPeer interface is standard SIP. All methods used hereare as per the standard, while a number of new events willbe used for this use case. SIP UAs start the successful newregistration procedure as described in RFC 36651. Userssend a SIP REGISTER request to their responsible SIPeer,which acts as a registrar. The sole purpose of this procedureis to establish the Contact: address of the UA and authenti-cate it as member of the network.

The SIPeer sends a PUT request to publish, refresh or up-date information about its associated SIP UA location infor-mation in the overlay. When the PUT operation completes,the peer notifies the SIPeer of the completion.

Once the users have been registered and authenticated,some of them decide to subscribe to the Live VideoStreaming service. Anna and Boris will subscribe to the’ListOfChannels’ global event in order to get informedabout the updated list of published channels. They will senda SIP SUBSCRIBE message to their responsible SIPeer andthe corresponding SIP NOTIFY message will contain thelist of current published channels. Once a new channel ispublished or an existing one becomes updated or even un-published, this information will be updated into the overlayand every single responsible SIPeer will notify this event toits associated users. At this point Anna and Boris get thecurrent list of existing channels.

The consumer initiates a new subscription to the Event:

ListOfChannels presence agent (Admitting node).The presence agent (admitting node) for ListOfChan-

[email protected] processes the subscription re-quest and creates a new subscription. A 200 OK responseis sent to confirm the subscription In order to completethe process, the presence agent (admitting node) sends theconsumer (Anna) a NOTIFY with the current state of theListOfChannels (i.e. current list of published channels) us-ing a Content-Type: application/pidf+xml The consumerconfirms receipt of the NOTIFY request

The seed user, which is a television camera, publishesthe channel being emitted (say, Channel X) and any addi-tional information concerning the channel (e.g. the genre,its encoding, technical characteristics, etc.). This informa-tion will be stored by the overlay, i.e., the SIPeer respon-sible for this seed (as per the DHT) will update the currentlist of channels and will store a new resource record intothe overlay on behalf of the seed. Assuming that the SIPeernode is not the responsible peer for the seed, the content ofthe PUBLISH message will be forwarded through the over-lay until it reaches its designated peer. From this momenton, the overlay knows about the seed emitting Channel Xand can also answer queries about Channel X and who is

1A complete message flow of the whole LiSP has been defined, togetherwith the detailed content of every message. These details are avoided inthe text, except where highly relevant, for clarity and lack of space.

emitting it, coming from potential viewers.SIPeer nodes will get (by polling or trapping) the up-

dated list of channels and will generate a notificationto those associated users which are subscribing to the’ListOfChannels’ event, using again a SIP NOTIFY mes-sage.

A Seed’s UA initiates a SIP PUBLISH to the admittingnode in order to update it with new List of Channels in-formation. The Expires header indicates the desired dura-tion of this soft state. Note that if a Seed decides to gooffline (finish the transmission of a channel) it may pub-lish this channel using an Expires header equal to zero.Again, information related to the channel is encoded usinga Content-Type: application/pidf+xml payload.

The presence agent (admitting node) receives, and ac-cepts the information. The published data is incorporatedinto the ListOfChannels event document. A 200 OK re-sponse is sent to confirm the publication. The 200 OK re-sponse contains an SIP-ETag header field with an entity-tag.This is used to identify the published event state in subse-quent PUBLISH requests.

At this point the seed in ready to broadcast its content,but there are no viewers as of yet, although Anna and Borisknow the existence of Channel X.

Next, the viewers choose to which channels they wouldlike to SUBSCRIBE (i.e., to watch). Hence, they sub-scribe to the specific event ’Status: Channel X’ and ’ListOf-Sources: Channel X’.

The first message will keep the viewer informed aboutany changes in the state of the channel while the corre-sponding NOTIFY message to the second event contains acomplete list of all nodes which are transmitting or relayingthat channel at the moment (which, so far, is only the seed).Should a new node start relaying the channel, or an existingone stop doing so, the corresponding NOTIFY (sent by thecorresponding node) would update that information. Re-member that the channel itself gets a Resource ID from theoverlay, and that there is one and only one responsible peerfor that Resource ID. Hence, every time that a new nodeSUBSCRIBEs to a channel through the consumer’s corre-sponding node (different from the channel’s correspondingnode), the content of that message will be routed throughthe overlay to the channel’s corresponding node, which canthen maintain a global list of people watching and relayingthe channel, as well as their state. Anna and Boris are sub-scribed to the Channel X and have been inserted as potentialmedia relays in the list of sources for the Channel X.

It is now up to the users to implement the local algorithmof their choice (e.g. OTSp2p [17]) to select to which relay-ing peers to connect in order to receive the data packets. It isalso its responsibility to select which chunks of informationor which layers to download from every one of the selectedrelaying nodes. As stated before, the consumer implements

most of the processing associated with live video streaming.At this point in the example, however, only the seed is trans-mitting. Hence, the viewers will now SUBSCRIBE to theendpoint events ’BufferMapUpdate: Channel X’ and ’Zap-ping: Channel X’ directly at the seed. The overlay does notkeep any information about Buffer Maps and hence does notparticipate in this interchange. With the first subscription,the corresponding NOTIFYs will send the current BufferMap to every viewer, so that they can choose which chunksto download from every source. The answer to the secondevent will immediately NOTIFY a viewer that a certain re-laying node has changed to viewing (and hence, relaying)another channel and is no longer available as data source.Anna and Boris will receive the buffer map image from theseed.

Viewers, after computing locally the optimum down-load assignment, start a SIP dialog (started with an INVITEtransaction) to start a video session with every chosen re-lay, specifying in the SDP body what chunks it desires todownload. These video sessions will be kept open as longas desired, even if no information is being downloaded atthe moment. This serves the viewer to have a number of”backup” relays, in case that some relay in use will eitherzap to another channel or simply disconnect. Combiningactive sessions with on-hold sessions accelerates the activa-tion of a substituting relay by the simple re-negotiation ofSDP parameters.

To signal the desire to establish an on-hold session, wefollow [7], which uses the a=inactive parameter. This spec-ifies that the session should be started in inactive mode andno media is sent over an inactive media stream. In orderto activate the session, the consumer may send a SIP reIN-VITE message with a=recvonly , which reflects its desire toreceive media, as explained in [7].

It is up to the viewer to decide how many backup re-lays it wants to keep on hold. Obviously, the more backuprelays are being kept, the more signalling will also be inter-changed among them, since the Buffer Maps must be con-stantly interchanged to calculate which chunks are availablefor download. The overlay, for its part, is not affected bythis signalling, which flows end to end and hence keeps theoverlay more scalable.

Once the session setup has been accepted, an MSRP [3]media session starts end-to-end from the seed to the twoviewers in the figure. MSRP is used for transmitting a se-ries of related chunks in the context of the session, whichisnegotiated using the Session Description Protocol (SDP),using SIP as a signaling protocol. Considering the Seed,and Anna and Boris as Consumers, the streaming has nowtruly begun.

In the event that new consumers –remember that Car-los is already registered–, would join the network, the samesteps would be followed: subscribe to get the list of chan-

Figure 3. MSRP usage for multi-chunk datatransport

nels, subscribe event ’Status: Channel X’ and ’ListOf-Sources: Channel X’. Once the new viewer gets the list ofbroadcasting nodes, which now contains not only the seed,but also the two previous viewers, Anna and Boris, whichnow can also act as relaying nodes, it subscribes directlyto the nodes of its choice (’BufferMapUpdate: Channel X’and ’Zapping: Channel X’).

After computing locally the optimum data download as-signment, the corresponding INVITE will open the mediasession with the chosen relays. Should some of the relays bea backup, then the INVITE will contain an SDP descriptionputting the media stream on-hold, signalling in this mannerthat the session must be kept on hold and no informationinterchanged.

So far, it has been assumed that the viewer downloads allavailable chunks from each chosen relay in order. Shouldit not be so, additional re-invite messages with an SDPbody specifying which chunks to download serves to no-tify which chunks should be retrieved next. Again, thesemessages travel end to end, without overlay participation.No SIP provisional responses are considered for the shakeof clarity.

Carlos constructs an SDP description of the chunks thathe wants to receive and attaches the SDP offer to a SIP IN-VITE request addressed to Anna.

m=message 7654 TCP/MSRP *

a=recvonly

a=accept-types:message/cpim

a=accept-wrapped-types:*

a=path:msrp://carlospc.university.edu:7654/jshA7we;tcp

a=file-selector:name:Fight Club type:video/msvideo

a=file-transfer-id:1

a=file-range:y-z

Each file has its own file transfer identifier, whichuniquely identifies each file transfer.

Anna receives the SIP INVITE request, inspects the SDPoffer, computes the file descriptor and finds a local filewhose hash equals the one indicated in the SDP. Anna ac-cepts the file transmission and creates an SDP answer whichis transmitted in a SIP 200 OK message.

Carlos acknowledges the reception of the 200 OK mes-sage. Carlos opens a TCP connection to Anna. Anna thencreates an MSRP SEND request that contains the file. Car-los acknowledges the reception of the SEND request. Theprocess would be repeated with Boris.

One of the advantages of using MSRP is that if aTCP connection towards Carlos is already open, and a re-INVITE is sent, Anna re-uses that TCP connection to sendan MSRP SEND request that contains the (desired part ofthe) file.

All the above described steps will be recursively fol-lowed in the event that new consumers would join the net-work, see Fig. 4.

Figure 4. Recursive relaying to new watchers.

This section has described the fundamental proceduresfor broadcasting live video content in the Internet basedon the P2PSIP Peer protocol draft and SIP. But to demon-strate the validity and not only the feasibility of such anapproach, some form of validation is needed. In this pa-per, we show through a complexity analysis the scalabilityof our approach for very large audiences, to which the nextsection is devoted.

6 Evaluation and Results

In order to perform an approximate evaluation of thecomplexity of our architecture, we will concentrate on thesignalling load involved in it. To that end, the SIP andPeer messages necessary for every major operation in thenetwork (e.g. the addition of a new relay or the constant

Variables descriptionN Number of Overlay PeersP Number of SIPeersM Number of users subscribing a global eventJ Number of users subscribing a specific eventK Number of users subscribing an endpoint eventL Number of users subscribing an endpoint event with on-hold sessionj Number of users notified when a specific event occurs

Figure 5. Main system variables

transmission of Buffer Maps) will be analyzed. Finally, twonumerical examples of smaller and larger networks will begiven to better grasp the results.

Fig. 5 presents the main variables involved in the evalu-ation. Fig. 6 shows the complexity, for the SIP as well asfor the Peer protocol, of the main events involved in the op-eration of the architecture, like a new user registration or aBuffer Map update notification. For its part, Fig. 7 presentsthe cost of the main operations in the architecture, i.e., theconcatenation of a series of smaller events that necessarilyhappen together for the operation to succeed.

It must be remembered that Chord is taken as DHT ex-ample for the architecture. Hence, the cost of storing or re-trieving a piece of information in the DHT (like registeringa new watcher or publishing a channel, say) is O(logN).Another important operation, the introduction of a new peer(fully dependent on the DHT complexity) is proportional to(O(logN)2). It follows that, a global event like the intro-duction of a new relay station, which must be communi-cated to all peers to update the ListOfSources event, has acomplexity of O(NlogN). This is the dominant factor inthe overall system’s cost, which presents a slight overlineargrowth. However, it must be remembered that only peersperform such operations, and they represent the minority ofthe nodes. Hence, when looking at the signaling growth be-tween the two chosen scenarios in the last rows of table 8, itcan be noticed that the increase in signaling load is stronglysublinear: For an increase in population of a factor 1000,the signalling load increase per node is closer to 100. Con-sequently, the system scales very well for large numbers ofconsumers, and not so well for large numbers of peers.

Another critical operation is the periodic interchangeof Buffer Maps among watchers and relays. Since everywatcher downloads that information from K relays, its costis proportional to O(K), but K is small (between 1 and 10,typically) and constant with the number of watchers, M.Hence, the cost associated with the only periodic systemoperation is small, constant and bounded.

As a conclusion, the results show that the systempresents very good scalability in terms of signalling loadinvolved, which is the most critical requirement for a livestreaming architecture.

To better exemplify the above results, numerical valuesfor two particular cases have been recorded in table 8. These

l

Event Session Messages (SIP) Overlay Messages (SEP)Peer Join/Leave 0 O(logN)2

Peer Put/Get 0 O(logN)User Registration 4 O(logN)Global Event subscription 4 0Global Event notification 2 + 2M O(NlogN)Specific Event notification 2j O(NlogN)Specific Event subscription 2 O(logN)Endpoint Event subscription 2K 0Endpoint Event notification 2K 0Session Establishment 3K 0Session Teardown 2K 0Session Update 3L 0

Figure 6. Complexity evaluation of the indi-vidual events in the architecture

Procedure Events Signaling LoadO(messages)

New Channel Global Event Notification M + NlogNNew Relay Specific Event subscription,

Specific Event notificationj + NlogN

New Watcher User Registration, GlobalEvent subscription, SpecificEvent subscription, SpecificEvent notification, EndpointEvent subscription, EndpointEvent notification, Session Es-tablishment, Session Update,Session Teardown

j+NlogN +K+L

New BufferMap Endpoint Event notification KNew Peer Peer Join/Leave, Peer Put/Get (logN)2 + logN

Figure 7. Complexity evaluation of the typicaloperations in LiSP

cases have been further subdivided in two cases: The firstin an static environment, where neither the number of peersnor of viewers vary (no churn). In the second, churn is anadditional factor, triggering repeatedly a number of addi-tional events, like ListOfSources updates. For this case, itis considered that the estimated watching time is equal to atypical movie’s length, and that the rate of arrival and de-parture of nodes is equal and constant, and set to 1% for thelarge network and 10% for the smaller one. Furthermore, inall cases it is assumed that the video has only one layer andthat all participating nodes are watching the same channeland cooperatively relaying its content to other nodes.

Under these assumptions, it can be seen that the over-all signalling involved, for the endpoints as well as for thepeer nodes is large in numerical value but insignificant whenconsidered per node and compared with the data transmis-sion rate. Furthermore, the signalling load remains roughlyconstant with the size of the network, which proves its scal-ability. These values are consistent for the static as well asfor the dynamic case (in presence of churn).

Using this numerical values, one can roughly estimatehow powerful the peers and consumers should be. Consid-ering a 3-way SIP dialog-creating request, i.e. three SIPmessages to set-up a session, it is possible to roughly esti-mate the number of simultaneous calls per second the peers

Figure 8. Signalling load for a small and alarge streaming network

will have to process. For production operation, it is sug-gested in [?] the following guideline for sizing server hard-ware to operate at 60% CPU utilization for some of the mostcommon SIP software packages (OpenSER V1.2 and SERV2.0): 1 GHz of CPU processing capacity can manage 60calls per second. In the small scenario, we can roughly es-timate 33 calls per second per peer. Capacity that could beachieved using one GHz of CPU processing capacity at 30%CPU utilization. For a larger scenario, the number of simul-taneous calls grows up to 333 calls per second per peer. Apeer with two dual-core 3.0 GHz CPUs would effectivelyhave (2CPUs ∗ 2cores ∗ 3GHzperCPU ) 12 GHz of CPUprocessing capacity. This server, hosting either OpenSERV1.2 or SER 2.0, would be able to manage this number ofcalls per second at approximately 30% CPU utilization.

For consumers, which are typically less powerful termi-nals, the situation is similar. Since the number of eventssubscribing to an endpoint event is constant, the number ofsimultaneous calls per second per consumer is exactly thesame for both large and small scenarios: 3 calls per second.This means a required capacity lower than 100 Mhz. Con-sidering that current mobile devices offer at least a capacityof 250-350 MHz, this approach is suitable for existent de-vices2.

As a conclusion, this first estimation of the complexityand signaling load involved in our architecture shows verypromising results, which support our belief that it can growto very large sizes without severe penalty: with commodity,existent sw and hw can build nodes able to host very largeaudiences.

7. Conclusions and Future work

This work has presented a layered P2PSIP-based archi-tecture for live video streaming with flexible applicationlogic placement.

In this document, a new control plane based on P2PSIPhas been designed and particularized for live video stream-ing with flexible application logic placement. This new con-trol plane is based upon the SEP protocol, a novel draft pro-tocol designed to be used jointly with SIP in P2PSIP sce-narios, and which is currently being discussed at the IETF.However, the main architectural characteristics are commonto all the protocol proposals at the P2PSIP WG and there-fore the results presented here can safely be extrapolated toall other draft protocols under discussion.

SEP presents a layered architecture, with consumers,clients and peers performing increasingly complex roles inthe maintenance of the overlay and the management of thenetwork resources: Consumers are unaware of any overlay,and act as classical SIP User Agents. Their interface to the

2Leaving out the limitations in battery power, which are another relatedconstraint for networked terminals, yet orthogonal to this discussion.

clients and the peers is standard SIP. Clients do know aboutthe overlay, and act as intermediate nodes between the con-sumers and the peers, if need be. Furthermore, they provideextra storage capacity to the peers, and basically can changetheir role from peer to client depending on resource avail-ability. Peers are the members of the overlay, and the onlyones that participate in the maintenance of it. Only the peerscommunicate through SEP.

The last part of the document has been devoted to de-scribing in detail how SIP and SEP would be combinedin an overall architecture to provide the live streaming ser-vice. The role of every node, as well as different applica-tion scenarios (powerful nodes, mobile nodes with batteryand CPU restrictions, large or small groups, etc.) have beenexplored. The cost analysis shows that the architecture cansafely scale to very large sizes, even in the presence of sus-tained and heavy churn, which is a requirement for com-mercial success of such platforms.

The proposed control plane and live streaming architec-ture presents a number of advantages: A fully decentral-ized architecture, as is expected of a P2P-based system,based completely on standardized protocols (or protocolsthat are being standardized right now, to be more precise).This brings with it the additional advantage of interoper-ability with existing SIP-based applications and services,like IM, videoconferencing, online gaming, etc. and theircorresponding commercial or opensource products.

The next steps will take the direction of prototyping theproposed control plane while exploring different scenarios.In particular, the Overlay does most scenario, which givesa much stronger involvement to the network operator in theservice provision will be analogously analyzed and com-pared to the Consumer does most case presented here. Thesubsequent tests and trials will help to refine and amelio-rate the architecture presented here, and hopefully pose newquestions that will drive our research further.

References

[1] D. Bryan, P. Matthews, E. Shim, , and D. Willis. Con-cepts and terminology for peer to peer sip, Nov. 2007.INTERNET-DRAFT draft-ietf-p2psip-concepts-01 (Workin progress).

[2] D. A. Bryan, B. B. Lowekamp, and C. Jennings. SOSIM-PLE: A serverless, standards-based, P2P SIP communica-tion system. In Proceedings of the AAA-IDEA 2005, June2005.

[3] B. Campbell, R. Mahy, and C. Jennings. The Message Ses-sion Relay Protocol (MSRP). RFC 4975 (Proposed Stan-dard), Sept. 2007.

[4] M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Row-stron, and A. Singh. Splitstream: High-bandwidth multicastin cooperative environments, 2003.

[5] M. e. a. Castro. Peer-to-peer overlays: Structured, unstruc-tured, or both? Microsoft Research, Tech. Rep. MSR-TR-2004-73, Cambridge, UK, 2004.

[6] Y.-H. Chu, S. G. Rao, and H. Zhang. A case for end sys-tem multicast. In Measurement and Modeling of ComputerSystems, pages 1–12, 2000.

[7] M. G.-M. et al. A session description protocol (sdp) of-fer/answer mechanism to enable file transfer, Mar. 2008.INTERNET-DRAFT draft-ietf-mmusic-file-transfer-mech-07 (Work in progress).

[8] J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek,and J. W. O’Toole, Jr. Overcast: Reliable multicasting withan overlay network. pages 197–212.

[9] X. Jiang, H. Zheng, C. Macian, and V. Pascual. Serviceextensible p2p peer protocol (sep), Feb. 2008. INTERNET-DRAFT draft-jiang-p2psip-sep-01 (Work in progress).

[10] D. Kostic, A. Rodriguez, J. Albrecht, and A. Vahdat. Bullet:High bandwidth data dissemination using an overlay mesh,2003.

[11] B. Li and H. Yin. Peer-to-peer live video streaming on theinternet: issues, existing approaches, and challenges [peer-to-peer multimedia streaming]. Communications Magazine,IEEE, 45(6):94–99, June 2007.

[12] X. Liao, H. Jin, Y. Liu, L. M. Ni, and D. Deng. Anysee:Peer-to-peer live streaming. INFOCOM 2006. 25th IEEE In-ternational Conference on Computer Communications. Pro-ceedings, pages 1–10, April 2006.

[13] V. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, andA. Mohr. Chainsaw: Eliminating trees from overlay mul-ticast, 2005.

[14] F. e. a. Pianese. PULSE: An adaptive, incentive-based, un-structured P2P live streaming system. IEEE Transactions onMultimedia, 9(8), December 2007.

[15] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston,J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP:Session Initiation Protocol. RFC 3261 (Proposed Standard),June 2002.

[16] E. Setton, P. Baccichet, and B. Girod. Peer-to-peer livemulticast: A video perspective. Proceedings of the IEEE,96(1):25–38, Jan. 2008.

[17] D. Xu, M. Hefeeda, S. Hambrusch, and B. Bhargava. Onpeer-to-peer media streaming, 2002.

[18] D. e. a. Yang. MPSS: A Multi-agents Based P2P-SIP RealTime Stream Sharing System, volume 4088 of LNCS Series,pages 398–408. Springer Verlag, 2006.

[19] W.-P. Yiu, X. Jin, and S.-H. Chan. Challenges and ap-proaches in large-scale p2p media streaming. Multimedia,IEEE, 14(2):50–59, April-June 2007.

[20] X. Zhang, J. Liu, B. Li, and Y.-S. Yum. Coolstream-ing/donet: a data-driven overlay network for peer-to-peerlive media streaming. INFOCOM 2005. 24th Annual JointConference of the IEEE Computer and CommunicationsSocieties. Proceedings IEEE, 3:2102–2111 vol. 3, 13-17March 2005.