5
Computer Networks Lecture 8: Content Delivery Infrastructure: Peer-to-Peer 2015 Internet Traffic Analysis Sandvine’s Global Internet Phenomena Report: https://www.sandvine.com/trends/global-internet-phenomena/ Content Distribution Most popular content can only be served if it is highly replicated across multiple servers reduce load at origin server improve performance for end users Most Content Delivery Infrastructures (CDI) have a large number of servers distributed across the Internet and cache content on these servers Content Delivery Infrastructure Peer-to-peer (p2p): hybrid p2p with a centralized server pure p2p hierarchical p2p end-host (p2p) multicast Content-Distribution Network (CDN)

2015 Internet Traffic Analysissugih/courses/eecs489/...Hierarchical P2P: Skype Skype forms a hierarchical P2P: • index mapping usernames to IP addresses is distributed across supernodes

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2015 Internet Traffic Analysissugih/courses/eecs489/...Hierarchical P2P: Skype Skype forms a hierarchical P2P: • index mapping usernames to IP addresses is distributed across supernodes

Computer Networks

Lecture8:ContentDeliveryInfrastructure:Peer-to-Peer

2015InternetTrafficAnalysis

Sandvine’sGlobalInternetPhenomenaReport:https://www.sandvine.com/trends/global-internet-phenomena/

ContentDistribution

Mostpopularcontentcanonlybeservedifitishighlyreplicatedacrossmultipleservers•  reduceloadatoriginserver•  improveperformanceforendusersMostContentDeliveryInfrastructures(CDI)havealargenumberofserversdistributedacrosstheInternetandcachecontentontheseservers

ContentDeliveryInfrastructure

Peer-to-peer(p2p):•  hybridp2pwithacentralizedserver•  purep2p•  hierarchicalp2p•  end-host(p2p)multicast

Content-DistributionNetwork(CDN)

Page 2: 2015 Internet Traffic Analysissugih/courses/eecs489/...Hierarchical P2P: Skype Skype forms a hierarchical P2P: • index mapping usernames to IP addresses is distributed across supernodes

HybridP2PandCentralizedServer

Napster:•  P2Pfiletransfer•  centralizedfilesearch:•  peersregisterIPaddressandcontentatacentralindexserver

•  peersquerycentralindexservertolocatecontent

register

queryreply

PureP2PArchitecture• noalways-onserver• arbitraryendsystemsdirectlycommunicate• peersareintermittentlyconnectedandchangeIPaddresses• example:Gnutella• highlyscalable(why?)• butdifficulttomanage•  howtofindpeer?•  howtofindcontent?

Gnutella•  nocentralizedindexserver•  networkdiscoveryusingpingandpongmessages

•  filediscoveryusingqueryandqueryHitmessages

•  bothpingandquerymessagesareforwardedusingthefloodingalgorithm:forwardonalllinksexceptincomingone

•  previouslyseenmessagesarenotfurtherforwarded

•  newversionofgnutellausesKaZaA-likesupernodes

A

B

ping

pong

A

B

query

queryHit

HierarchicalP2PFastTrackusedbyKaZaA,Groskster,iMesh,Morpheus•  hierarchicalarchitecture•  peersdividedintosupernodesandordinarynodes

•  eachsupernodekeepsanindexofallitschildren’sfiles

•  requestsaresenttosupernodes•  supernodesqueryeachotherforfilesnotintheirlocalindices

•  ordinarynodesare“promoted”tosupernodesiftheyhaveenoughresourcesandhavestayedonnetworklongenough

•  paralleldownloadoffiles

reg

req

Page 3: 2015 Internet Traffic Analysissugih/courses/eecs489/...Hierarchical P2P: Skype Skype forms a hierarchical P2P: • index mapping usernames to IP addresses is distributed across supernodes

HierarchicalP2P:SkypeSkypeformsahierarchicalP2P:•  indexmappingusernamestoIPaddressesisdistributedacrosssupernodes•  searchesforSkypeusersaresenttosupernodes

•  supernodesqueryeachotherforusersnotintheirlocalindex

• supernodeschooseapeertoactasrelayfortwoNATtedusers

eDonkey/eMulealsobuildsahierarchicalnetwork,butthe“supernodes”arededicatedservers,notjustmoreequalpeers

openhost

A

B

maintain a central index of files, so that userscan send requests directly to information hold-ers. Unfortunately, centralization creates a sin-gle point of failure that is easy to attack. Forexample, if you were trying to phone MichaelJordan, the simplest way to get his numberwould ordinarily be to call directory assistance.However, because directory assistance is central-ized, your access can be easily blocked if Jordanor someone else decides to remove his directoryentry, or if the service goes down.

Systems like Gnutella broadcast queries to everyconnected node within some radius. Using thismethod, you would ask all of your friends if anyof them knew Jordan’s number, get them to asktheir friends, and so on. Within a few steps, thou-sands of people could be looking for his number.Although this process would eventually find youranswer, it is clearly wasteful and unscalable.

Freenet avoids both problems by using asteepest-ascent hill-climbing search: Each nodeforwards queries to the node that it thinks isclosest to the target. You might start searchingfor Jordan by asking a friend who once playedcollege basketball, for example, who might passyour request on to a former coach, who couldpass it to a talent scout, who might pass it to Jor-dan’s agent, who could put you in touch with theman himself.

Requesting files. Every node maintains a routingtable that lists the addresses of other nodes and theGUID keys it thinks they hold. When a nodereceives a query, it first checks its own store, and ifit finds the file, returns it with a tag identifyingitself as the data holder. Otherwise, the node for-wards the request to the node in its table with theclosest key to the one requested. That node thenchecks its store, and so on. If the request is suc-

cessful, each node in the chain passes the file backupstream and creates a new entry in its routingtable associating the data holder with the request-ed key. Depending on its distance from the holder,each node might also cache a copy locally.

To conceal the identity of the data holder, nodeswill occasionally alter reply messages, setting theholder tags to point to themselves before passingthem back up the chain. Later requests will stilllocate the data because the node retains the truedata holder’s identity in its own routing table andforwards queries to the correct holder. Routingtables are never revealed to other nodes.

To limit resource usage, the requester gives eachquery a time-to-live limit that is decremented ateach node. If the TTL expires, the query fails,although the user can try again with a higher TTL(up to some maximum). Because the TTL can giveclues about where in the chain the requester is,Freenet offers the option of enhancing security byadding an initial mix-net route before normalrouting. This effectively repositions the start of thechain away from the requester.

If a node sends a query to a recipient that isalready in the chain, the message is bounced backand the node tries to use the next-closest keyinstead. If a node runs out of candidates to try, itreports failure back to its predecessor in the chain,which then tries its second choice, and so on.

Figure 1 depicts a typical request sequence. Theuser initiates a request at node A and forwards therequest to B, which forwards it to C. Node C isunable to contact any other nodes and returns a“request failed” message to B. Node B then triesits second choice, E, which forwards the requestto F. Node F forwards the request to B, whichdetects a loop and bounces the message back.Unable to contact any additional nodes, node Fbacktracks one step to E, which forwards therequest to its second choice, D, and locates thefile. D returns the file via E and B back to A,which sends it to the user. Along the way, E, B,and A might also cache the file.

With this approach, the request homes in closerwith each hop until the key is found. A subsequentquery for this key will tend to approach the firstrequest’s path, and a locally cached copy can sat-isfy the query after the two paths converge. Sub-sequent queries for similar keys will also jump overintermediate nodes to one that has previously sup-plied similar data. Nodes that reliably answerqueries will be added to more routing tables, andhence, will be contacted more often than nodesthat do not.

44 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING

Peer-to-Peer Networking

Figure 1.Typical request sequence.The request moves through thenetwork from node to node, backing out of a dead-end (step 3) anda loop (step 7) before locating the desired file.

= Data request

= Data reply

= Request failedRequester

Data holder

2 3

58

910

4

1

11

12

67

a b

c

d

ef

Freenet:AnonymousP2P•  noindexserver•  requesterdoesn’tconnectdirectlytocontentprovider

•  instead,contentispassedinabucket-brigadefashionfromprovidertorequester

•  subsequentrequestforthesamecontentissatisfiedfromthenearestcache

•  requestercannotdifferentiateproviderfromacacheholderoraforwardingpeer(allowsforanonymity)

BitTorrentContentdistribution:•  contentisdividedintoNpiecesof16KBeachandsenttoNpeers

Contentdownload:•  todownloadafile,apeermustfirstregisterwithaTracker•  Trackerreturnsarandomlistofpeerswhohavethefile•  peeropensabout5TCPconnectionstotheprovidedpeers•  apeerwillonlyuploadtopeersfromwhomitcanalsodownload(“tit-for-tat”)

Summary:P2POverlayNetworksP2Papplications/peersneedto:•  trackidentitiesandIPaddressesofpeers•  theremaybealargenumberofpeers•  peersmaycomeandgofrequently(highchurn)•  can’tkeeptrackofallpeers•  routemessagesamongpeers•  maybemulti-hop

Overlaynetwork•  peershavetodobothnamingandrouting•  IPbecomes�just�thelow-leveldeliverysubstrate•  allIProutingisopaque

application

transport

network

link

physical

Page 4: 2015 Internet Traffic Analysissugih/courses/eecs489/...Hierarchical P2P: Skype Skype forms a hierarchical P2P: • index mapping usernames to IP addresses is distributed across supernodes

ModesofDeliveryUnicast,broadcast,multicastAssumingavideoconference

involvingS,D2,andD3•  unicasting:twocopiesofpacketsfromSaresentovertheSRlink•  broadcasting:onecopyofpacketssentfromStoalldestinations,butpacketssenttoD1andD4unnecessarily•  multicasting:onecopyofpacketsfromSissentovertheSRlink,RthensendsonecopyeachtoD2andD3

D4

D3

D2

D1

RS

MulticastDelivery

Usesofmulticasting:•  videoconferencing,distancelearning,distributedcomputation,p2pdelivery,multi-playergaming,etc.

Multicastdesigngoals:•  cansupportmillionsofreceiverspermulticastgroup•  receiverscanjoinandleaveanygroupatanytime•  sendersdon’tknowallreceivers•  sendersdon’thavetobemembersofagrouptosend•  therecouldbemorethanonesenderspergroup

MulticastGroupManagementIssuesinmulticastgroupmanagement:1.  howtoadvertise/discoveramulticastgroup?2.  howtojoinamulticastgroup?3.  deliveringmulticastpacketstothegroup

IPmulticast:• usemulticastaddressesasanonymousrendezvouspoint:•  IPv4:Class-D(224.0.0.0to239.255.255.255[RPC3171])•  265Mmulticastgroupsatmost•  IPv6:multicastprefix:FF00::/8

•  createawell-knownmulticastgroup(address)toadvertise/discovermulticastgroups

MulticastDeliveryIPmulticast:

•  sendersendsasinglepackettotheIPmulticastaddress

•  multicastdataissentbest-effort,usingUDP(why?)

•  routersdeliverpacketsoutallinterfacesthathasareceiverbelongingtothemulticastgroup

•  receiversjoingroupsbyinformingupstreamrouters,e.g.,byusingInternetGroupManagementProtocol(IGMP)

•  notuniformlydeployedthroughouttheInternet

Page 5: 2015 Internet Traffic Analysissugih/courses/eecs489/...Hierarchical P2P: Skype Skype forms a hierarchical P2P: • index mapping usernames to IP addresses is distributed across supernodes

FloodandPruneHowtoensurethatonlyonecopyofpacketfromSisforwardedbyP3toP4?•  keeptrackofpacketsequencenumber•  onlyforwardpacketthatcomesfromshortestpathfrom(to)source

HowtoensurethatonlyonecopyofpacketfromSreachesP3?•  onlyforwardifselfisonneighbor’sshortestpathfrom(to)source

•  prune(P3tellsP2nottoforwardpacketsfromS )•  mustbedonepersourceiftherearemultiplesources,

eachsourceformingitsownmulticastgroupand(logically)itsownmulticasttree

•  mustperiodicallyfloodincaseofmembershipchange

S

P3

P2P1

P4

End-hostMulticastIssuesinmulticastgroupmanagement:1.  howtoadvertise/discoveramulticastgroup?2.  howtojoinamulticastgroup?3.  deliveringmulticastpacketstothegroup

End-host(p2p)multicast:•  usesawell-known,centralizedrendezvousserver•  eachpeermustregisterwithrendezvousserver•  rendezvousserverreturnsa(random)listofpeers•  eachpeercansupportonlyalimitednumberofpeers•  avoidsendingduplicatemessagesandlooping:•  ifsinglesource,constructsashortest-pathtreerootedatsource•  orusesflood-and-prunealgorithm

•  preferspeersinsamesubnet

P2PChallenges

RelativetoIPnetworking:• muchhigherfunction,moreflexible• muchlesscontrollable/predictable

Relativetootherparallel/distributedsystems:•  noadministrativeorganizations•  fewguaranteesontransport,storage,etc.•  partialfailure•  churn•  networkbottlenecksandotherresourceconstraints•  trustissues:security,privacy,incentives

ChallengesforP2PNetworks1.  NATandfirewall:•  cannotpeerwithahostyoucan’taddress

Solutions:•  Gnutella:•  queriersendsPUSHmessagetoresponderoverthep2pnetwork•  responderopensaTCPconnectiontoquerierandsendsoverthefile•  noluckifbotharebehindfirewalls

•  KaZaA,eDonkey,Skype:•  asupernodeactsasrelayifbothpeersarebehindfirewalls

•  StandardstotraverseNAT(andfirewall!):UPnP,STUN,TURN

2. Download/uploadbandwidthasymmetry�needsbandwidthsubsidybycontentproviderorCDN,

orsufferlongdownloadtime