An Architecture for Distributed Content Delivery Networks.pdf

Embed Size (px)

Citation preview

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    1/64

    An Architecture for Distributed Content Delivery Network

    A minor thesis submitted in partial fulfilment of the requirements for the degree of

    Masters of Applied Science (Information Technology)

    Jaison Paul Mulerikkal

    School of Computer Science and Information Technology

    Science, Engineering, and Technology Portfolio,

    Royal Melbourne Institute of Technology

    Melbourne, Victoria, Australia

    July 17, 2007

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    2/64

    Declaration

    This thesis contains work that has not been submitted previously, in whole or in part, for any

    other academic award and is solely my original research, except where acknowledged.

    This work has been carried out since January 2007, under the supervision of Dr.Ibrahim

    Khalil.

    Jaison Paul Mulerikkal

    School of Computer Science and Information Technology

    Royal Melbourne Institute of Technology

    July 17, 2007

    i

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    3/64

    Acknowledgment

    I would like to thank Dr. Ibrahim Khalil, for his continuous support and guidance throughout

    the course of this minor thesis. It is his constant inspiration and encouragement that helpedme to complete this task, successfully. I specially thank him for his painstaking efforts in

    proof reading the drafts of this work.

    I also thank Dr Jiankun Hu for his valuable suggestions and contributions towards this project.

    ii

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    4/64

    Contents

    1 Introduction 3

    2 Background 9

    2.1 CDN Main Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.1 Surrogate Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.2 DNS Lookup and Redirection . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.3 DNS Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.1.4 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.1.5 Selection of Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.1.6 Cached Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.1.7 Outsourcing Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.1.8 Accounting and Billing Mechanism . . . . . . . . . . . . . . . . . . . . 12

    2.2 Conventional CDN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2.1 Commercial (Client-Server) Architecture . . . . . . . . . . . . . . . . . 13

    2.2.2 Academic (Peer-to-Peer) Architecture . . . . . . . . . . . . . . . . . . 14

    2.2.3 Limitations of Existing CDN Architectures . . . . . . . . . . . . . . . 15

    2.3 Distributed Content Delivery Network - An Effective Alternative . . . . . . . 16

    3 Architecture - Distributed Content Delivery Network 17

    3.1 DCDN Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    iii

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    5/64

    3.1.1 Distribution of Content - The Process . . . . . . . . . . . . . . . . . . 19

    3.1.2 Content Delivery to a User . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.2 DCDN Design Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.2.1 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.2.2 Effective Redirection and Load-balancing Algorithm . . . . . . . . . . 26

    3.2.3 Billing and SLA (Service Level Agreement) Verification Software . . . 27

    3.3 Business Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.3.1 Network Marketing (NM)/ Multi Level Marketing (MLM) . . . . . . . 28

    3.3.2 Special Scenarios of DCDN Advantage . . . . . . . . . . . . . . . . . . 30

    4 Performance Analysis and Load Balancing Algorithm 31

    4.1 Performance Parameters and Assumptions . . . . . . . . . . . . . . . . . . . . 31

    4.2 Queuing Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    4.3 Queuing Theory Modeling for Different Scenarios . . . . . . . . . . . . . . . . 33

    4.4 Load Balancing Algorithm for DCDN Servers . . . . . . . . . . . . . . . . . . 34

    5 Simulations and Results 38

    5.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    5.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    5.3 Overview of Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    5.4.1 Page Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    5.4.2 DCDN Surrogate - CPU Utilization vs. CDN Server - CPU Utilization 43

    5.4.3 DCDN Server - CPU Utilization vs. CDN Load Balancer - CPU Uti-

    lization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    iv

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    6/64

    6 Conclusion and Future work 46

    6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    A Softwares Used 48

    B Abbreviations 49

    C Symbols 50

    D Simulation Snapshots 51

    v

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    7/64

    List of Figures

    1.1 CDNs and Web Content Distribution . . . . . . . . . . . . . . . . . . . . . . . 4

    3.1 DCDN Content Distribution Architecture . . . . . . . . . . . . . . . . . . . . 18

    3.2 DCDN Content Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.3 DCDN Basic Transition Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.4 DCDN Transition Diagram - Including Contingency Plans . . . . . . . . . . . 23

    3.5 Local DCDN Server Zones - Contingency Plan . . . . . . . . . . . . . . . . . 24

    3.6 DCDN Transition Diagram - Including Security Solutions . . . . . . . . . . . 26

    3.7 Pyramid Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.8 MLM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    4.1 Utilization v/s Total System Delay . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.2 Utilization v/s Rejection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    5.1 Page Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    5.2 DCDN Surrogate(Server) Utilization . . . . . . . . . . . . . . . . . . . . . . . 43

    5.3 DCDN Server (Load Balancer) Utilization . . . . . . . . . . . . . . . . . . . . 44

    D.1 Simulation Snapshot - CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    D.2 Simulation Snapshot - DCDN . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    D.3 Application Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    D.4 Profile Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    vi

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    8/64

    List of Tables

    1.1 Commercial Content Delivery Networks . . . . . . . . . . . . . . . . . . . . . 5

    1.2 Academic Content Delivery Networks . . . . . . . . . . . . . . . . . . . . . . 6

    5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    1

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    9/64

    Abstract

    Commercial Content Delivery Networks create their own network of servers around the globe

    to effectively deliver Web content to the end-users. The peering of Content Delivery Networks(CDN) increase the efficiency of commercial CDNs. But still the high rental rates resulting

    from huge infrastructure cost make it inaccessible to medium and low profile clients . Aca-

    demic models of peer-to-peer CDNs aim to reduce the financial cost of content distribution

    by forming volunteer group of servers around the globe. But their efficiency is at the mercy

    of the volunteer peers whose commitment is not ensured in their design. We propose a new

    architecture that will make use of the existing resources of common Internet users in terms

    of storage space, bandwidth and Internet connectivity to create a Distributed Content Deliv-

    ery Network (DCDN). The profit pool generated by the infrastructure savings will be shared

    among the participating nodes (DCDN surrogates) which will function as an incentive for

    them to support DCDN. Since the system uses the limited computing resources of common

    Internet users, we also propose a suitable load balancing (LB) algorithm so that DCDN surro-

    gates are not burdened with heavy load and requests are fairly assigned to them. Simulations

    have been carried out and the results show that the proposed architecture (with LB) can offer

    same or even better performance as that of commercial CDN.

    2

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    10/64

    Chapter 1

    Introduction

    The growth of the World Wide Web and new modes of Web services have triggered an

    exponential increase in Web content and Internet traffic [Molina et al., 2004; Vakali and

    Pallis, 2003; Presti et al., 2005]. The Web content consists of static content (e.g. Static

    HTML pages, images, documents, software patches), streaming media (e.g. audio, real time

    video) and varying content services (e.g. directory service, e-commerce service, file transfer

    service) [R. Buyya and Tari, 2001]. As the Web content and the Internet traffic increases,

    individual Web servers find it difficult to cater to the needs of end-users. In order to store andserve huge quantities of Web content, Web server farms - a cluster of Web servers functioning

    as a single unit - are introduced [Burns et al., 2001].

    Even those Web server farms find it difficult to deal with flash crowds - large number of simul-

    taneous requests for a popular content - that are frequently experienced in Web traffic [Pan

    et al., 2004]. Moreover, those server farms are geographically distant from the end-users in

    most of the cases. The non-proximity of the Web servers to the end-users badly affect the

    response time of the Web requests, resulting in undesirable delays [Pan et al., 2004].

    Replication of same Web content around the globe in a net of Web servers is a solution to

    the above issue. However, it is not financially viable for individual content providers to set

    up their own server networks. An answer to this challenge is the concept of Content Delivery

    Network (CDN) that was initiated in 1998 [Douglis and Kaashoek, 2001; Vakali and Pallis,

    2003].

    The basic idea is to improve the performance and scalability of content retrieval by geograph-

    ically distributing a network of Web servers around the globe and allowing several content

    providers to host their content in those servers. . It allows a number of content providers to

    3

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    11/64

    Public

    Internet

    ISP

    ISP

    ISP

    Internet

    Backbone

    CDN Node

    CDN Node

    CDN Node

    Web Server

    (Content Provider)

    Figure 1.1: CDNs and Web Content Distribution

    upload their Web content into the same network of Web servers (also called, CDN servers)

    and thereby to reduce the cost of content replication and distribution.

    In a typical CDN environment(Figure 1.1), the replicated Web server clusters are located at

    the edge of the network to which the end-users are connected. The end-users interact with

    the CDN specifying the content-service request through cell phone, PDA, laptop, desktop etc.

    The Web content based on user requests are fetched from the origin server and a user is served

    with the content from the nearby replicated Web server. Thus the users end up communicating

    with a replicated CDN server close to them and retrieve files from that server. From the very

    inception of the concept, CDN has gone through dramatic evolution. There are a number

    of CDNs available around the globe Douglis and Kaashoek [2001]; Vakali and Pallis [2003];

    Pathan [2007] and are collectively called as Conventional CDN architectures in this minor

    thesis. They can be mainly classified into two:

    1. Commercial CDNs

    2. Academic CDNs

    The Commercial networks are owned by corporate companies and generally follow central-

    ized client-server architecture. Some of them have more than 20,000 servers around the globe

    4

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    12/64

    Name Description

    Akamai Founded in 1998 at Massachusetts, USA, Akamai is

    considered to be the pioneer in CDN business. It has

    reported a net income of 283.115 million USD in 2005.

    Mirror Image Web, Inc Founded in 1999 at Massachusetts, USA. Besides con-

    tent distribution, streaming and content access ser-

    vices are provided.

    Local Mirror It is a U.S.-based privately held corporation that of-

    fers Content Delivery Network service incorporated in

    2005. It is a provider for static content, audio, videostreaming and distribution.

    Limelight Networks Founded in 2001 in Tempe, Arizona, USA Limelight

    Network provides a network for bandwidth-intensive

    rich media applications over Web.

    Table 1.1: Commercial Content Delivery Networks

    to support their network. A list of prominent commercial CDN providers are given in Ta-

    ble 1.1 [Pathan, 2007].

    The academic CDNs are non-profitable in nature and generally follow peer-to-peer archi-

    tecture. These peer-to-peer Content Delivery Network models allow content providers to

    organize themselves together and to operate within their own hosting platforms. Some of the

    important academic CDN providers are given in Table 1.2 [Pathan, 2007].

    Conventional CDN architectures - Commercial CDN and Academic CDN - have got their own

    advantages. But their marjor pitfalls are:

    High rental rates of commercial CDN services resulting from huge infrastructural cost.

    Efficiency of academic CDNs is at the mercy of the volunteer peers whose commitment

    is not ensured in their design.

    The huge financial cost involved in setting up a commercial CDN compels the commercial

    CDN providers to charge high remuneration for their service from their clients (the content

    providers). Usually this cost is so high that only large firms can afford it. On the other

    hand, the academic CDNs do not provide a built-in network of independent servers around

    5

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    13/64

    Name Description

    Coral It is a free peer-to-peer CDN designed to mirror Web content. It uses

    architecture very similar to a distributed Web proxy. To access a Website

    through the Coral cache, we need to simply add .nyud.net:8080 to the

    hostname in the sites URL.

    Globule It is an open-source CDN developed at Vrije University in Amsterdam.

    It is introduced as a third party module for Apache HTTP server.

    FCAN Flash Crowd Alleviation Network is an adaptive CDN that dynamically

    optimises between peer-to-peer and client-server architectures to allevi-

    ate flash crowds.

    Table 1.2: Academic Content Delivery Networks

    the globe. That means, the risk and responsibility of running content distribution network

    ultimately goes back to the content providers themselves. The content providers, who are

    generally not interested in taking such big risks and responsibility, do not find academic CDNs

    as attractive alternatives to commercial CDNs.

    Objectives

    The above brief discussion (which will be further explained in 2.3.1) suggests that there is

    a need for more reliable and scalable CDN architecture without fresh infra-structural invest-

    ment. A unique CDN architecture is required to address these issues.

    A lot of work has been done in this area aimed at these ends. An academic CDN, Glob-

    ule, which is envisaged as Collaborative Content Delivery Network (CCDN) [Pierre and van

    Steen, 2006a] aims to provide performance and availability through Web servers that cooper-

    ate across a wide area network. Coppens et al. [2006] proposes the idea of a self-organizing

    Adaptive Content Distribution Network (ACDN), where they introduce a less centralized

    replica placement algorithm - (COCOA - Cooperative Cost Optimization Algorithm) which

    will push the content more to the clients. Though most of these works seem to be theoretically

    sound, they never challenged the efficiency and reliability of commercial client-server architec-

    ture for they were purely peer-to-peer architecture which will be effective only at the mercy

    of participating peers, whose performance is not under the control of suggested architecture.

    A successful alternative to Commercial CDNs with comparable performance and reliability

    can be assured only by ensuring proportionate incentives to the participating nodes which

    will function as a driving force for those peers to stay alive with minimum service rates.

    6

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    14/64

    Involving Web users with comparatively high bandwidth of Internet connection (broadband

    or higher) to form a Distributed Content Delivery Network (DCDN) for proportionate re-

    muneration evoke curiosity and challenges. Clusters of DCDN surrogates (participating Web

    users) will be replacing the conventional CDN servers in this architecture pushing the content

    very much near to the end-users.

    The objectives of this thesis can be summed up as follows:

    Suggest a practical and viable architecture for DCDN and discuss its possible challenges.

    Suggest a load balancing algorithm for DCDN servers based on queuing theory analysis

    of DCDN surrogate.

    Compare the performance of DCDN architecture against commercial CDN architecture

    using simulation techniques.

    Contribution

    This work aims to propose a new architecture for CDN that will make use of the limited but

    readily available resources of common Internet users. To achieve this objective, the thesis

    makes the following contributions.

    Suggests a unique DCDN architecture and proposes a workable business model to suc-

    cessfully implement it in the real-time scenario.

    Suggests an appropriate load balancing algorithm for DCDN Local servers by analyzing

    the performance of DCDN surrogate in terms of average system delay and rejection

    rate.

    Discusses the performance of DCDN architecture in comparison with commercial CDN

    using simulation results.

    Organization

    The origin of CDN and the need and scope of DCDN are given in Chapter 1. The main

    concepts and the evolution of conventional CDN architectures in the light of previous work

    are discussed in Chapter 2. Chapter 3 discusses the proposed DCDN architecture in detail. It

    will be followed by an analysis of major performance parameters of DCDN surrogates - average

    system delay and rejection rate - in Chapter 4. On the light of those results, a probable load

    7

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    15/64

    balancing algorithm for DCDN servers is suggested in the same chapter. Simulations and its

    results to compare the DCDN architecture against commercial CDN architecture constitute

    Chapter 5. Finally the thesis is concluded with a discussion about future work.

    8

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    16/64

    Chapter 2

    Background

    In this chapter we discuss the different entities that constitute the technical backbone of a

    Content Delivery Network (CDN) in the light of previous works. Further the conventional

    architectures of CDN - commercial (client-server) and academic (peer-to-peer) - are evaluated

    and the need of a new architecture is discussed.

    2.1 CDN Main Concepts

    2.1.1 Surrogate Servers

    These are the collection of (non-origin) servers that attempt to offload work from origin servers

    by delivering content on their behalf. Surrogate Servers are to be placed all around the globe,

    according to various needs and business considerations. Since location of surrogate servers is

    closely related to the content delivery process, it puts extra emphasis on the issue of choosing

    the best location for each surrogate. Many approaches (e.g: theoretical, heuristic) have been

    developed to model the surrogate server placement problem [Telematica Institute, 2007].

    2.1.2 DNS Lookup and Redirection

    The first step taken by a client to retrieve the content for a URL from Web is to resolve

    the server name portion of the URL to the IP address of a machine containing the URL

    content. The client does this resolution with a Domain Name System (DNS) lookup. The

    resolution causes a DNS request to be sent to a local DNS server. If the local DNS server

    does not have the address mapping already in its cache, the local DNS server sends a query

    9

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    17/64

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    18/64

    its load when it experiences moderate to high load. The monitoring system in Akamai also

    transmits data centre load to the top-level DNS resolver to direct traffic away from overloaded

    data centres. In addition to load balancing, Akamais monitoring system provides centralized

    reporting on content service for each customer and content server. This information is useful

    for network operational and diagnostic purposes [Wikipedia, 2007].

    2.1.4 Replication

    Commercial CDNs (e.g. Akamai) replicate content across the globe for large organizations

    like CNN or Apple, that needs to deliver large volumes of data in a timely manner.

    Using replication techniques, one or more copies of a single Web content (e.g: streaming media

    asset) can be maintained on one or more surrogate servers. Context-aware heuristics are

    proposed by Thomas Buchholz and Linnhoff-Popien [2005] for content replication to increase

    the monetary value of replicated content where a replicas profit is dependent on the number

    of requests it receives from time interval. The clients discover an optimal replica origin server

    for clients to communicate with. Here, optimality is a policy based decision which is based

    upon proximity or other criteria such as load [Telematica Institute, 2007].

    2.1.5 Selection of Content

    The choice of content to be delivered to the end-users is important for content selection.

    Content can be delivered to the customers in full or in partial. In full-site content delivery

    the surrogate servers perform entire replication in order to deliver the total content site to

    the end-users. In contrast, partial content delivery provides only embedded objects such as

    Web page images from the corresponding CDN.

    2.1.6 Cached Delivery

    A surrogate server may be equipped with a streaming media cache. This enables on-demand

    content to be dynamically replicated locally, perhaps in an encrypted format. The surrogate

    may attempt to store all cacheable media files upon first request. When a surrogate receives

    a client request for on-demand media, it determines whether the content is cacheable. Then

    it checks to see whether the requested media already resides in its local cache. If the media

    is not already in the cache, the surrogate acquires the media file from the source server and

    simultaneously delivers it to the requesting client. Subsequent requests for the same media

    11

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    19/64

    clip can be served without repeatedly pulling the clip across the network from the source

    server [Telematica Institute, 2007].

    2.1.7 Outsourcing Content

    Given a set of properly placed surrogate servers in a CDN infrastructure and a chosen content

    for delivery, it is crucial to decide which content outsourcing practice is to follow. There are

    basically three content outsourcing schemes and they are enumerated below.

    1. Cooperative push-based approach: In this appraoch, content is pushed to the surrogate

    servers from the origin and each request is directed to the closest surrogate server orotherwise the request is directed to the origin server [Zhiyong Xu and Bhuyan, 2006].

    2. Non-cooperative pull-based approach:, Here, client requests are directed (DNS redirec-

    tion) to their closest surrogate servers. If there is a cache miss, surrogate servers pull

    content from the origin server [Dilley et al., 2002].

    3. Cooprative pull-based approach: It differs from the non-cooperative approach in the

    sense that surrogate servers cooperates each other to get the requested content in case

    of cache miss. Using a distributed index, the surrogate servers find nearby copies of

    requested content and store in the cache [Zhiyong Xu and Bhuyan, 2006].

    2.1.8 Accounting and Billing Mechanism

    CDN providers charge their customers according to the content delivered by their surrogate

    servers to the clients. There are technical and business challenges in pricing CDN services.

    The average cost of charging of CDN services is quite high. The most influencing factors af-

    fecting the price of CDN services include: bandwidth cost, variation of traffic distribution, size

    of content replicated over surrogate servers, number of surrogate servers, reliability and sta-

    bility of the whole system and security issues of outsourcing content delivery [Krishnamurthy

    et al., 2001]. CDNs support an accounting mechanism that collects and tracks information

    related to request routing, distribution and delivery. This mechanism gathers information in

    real time and collects it in for each CDN component. This information can be used in CDNs

    for accounting, billing and maintaining purposes.

    12

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    20/64

    2.2 Conventional CDN Architectures

    2.2.1 Commercial (Client-Server) Architecture

    The classical example is of Akamai. Akamai offers content delivery services to content

    providers by offering worldwide distributed platform to host their content. It is done by

    installing a worldwide network of more than twenty thousand Akamai Surrogate Servers [Dil-

    ley et al., 2002].

    Akamai represents the centralized approach of CDN where the customers (the content providers)

    hire their share of space in Akamai servers to support the distribution and easy download of

    their Web content (Web pages or dynamic streaming content). A typical approach by whichAkamai provides this service is as follows:

    1. The clients browser requests the default Web page at the Content Providers site. The

    site returns the Web page index.html.

    2. The HTML code contains link to some content (eg: images) hosted on the Akamai

    owned server.

    3. As the Web browser parses the HTML code, it pull the content from Akamai server [Wikipedia,

    2007].

    Akamai uses a simple tool called Free Flow Launcher for its customers that they use to

    Akamaize their pages [Mahajan, 2004]. The users will specify what content they want to be

    served through Akamai and the tool will go ahead and Akamaize the URLs. This way the

    customers still have complete control of what gets served through Akamai and what they still

    are in charge of. Now the customer is responsible only for the content he chooses to server

    himself and first few hits of other content till the Akamai caches warm up [Reitz, 2000].

    Peering of Commercial CDNs

    The commercial CDNs are owned and operated by individual companies. Although there are

    many commercial CDN providers, they do not cooperate in delivering content to end-users in

    a scalable manner. In addition, content providers are typically subscribed to one of the CDN

    providers and are unable to utilize services of multiple CDN providers at the same time. Such

    a closed, non-cooperative model results in creation of islands of CDNs.

    13

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    21/64

    To compromise expense and to ensure better service to the clients, CDN providers need to

    partner together so that each can supply and receive services in a cooperative and collaborative

    manner that one CDN cannot provide to content providers otherwise. The objective of a CDN

    is to satisfy its customers with competitive services. If a particular CDN provider is unable

    to provide quality service to the end-user requests, it may result in Service Level Agreement

    (SLA) violation and adverse business impact. In such scenarios, one CDN provider partner

    with other CDN provider(s), which has caching servers located near to the end-user and serve

    the users request, meeting the Quality of Service (QoS) requirements [Lazar and Terrill, 2001].

    This is called peering of CDNs.

    A Virtual Organization (VO) model for forming Content and Service Delivery Networks

    (CSDN) and a policy framework within the VO model is suggested for the peering of CDNs

    by R. Buyya and Tari [2001]. Delivery of content in such an environment will meet QoS

    requirements of end-users according to the negotiated SLA.

    2.2.2 Academic (Peer-to-Peer) Architecture

    Distributed computer architectures labelled peer-to-peerare designed for the sharing of com-

    puter resources (content, storage, CPU cycles) by direct exchange, rather than requiring the

    intermediation or support of a centralized server or authority. Peer-to-peer architectures arecharacterized by their ability to adapt to failures and accommodate transient populations of

    nodes while maintaining acceptable connectivity and performance [Androutsellis-Theotokis

    and Spinellis, 2004].

    The same technique has been proposed and adopted for creating reliable CDN for the propa-

    gation of Web content. A peer-to-peer (P2P) CDN is a system in which the users get together

    to forward contents so that the load at a server is reduced.

    In its most basic form, a peer-to-peer content distribution system creates a distributed storage

    medium that allows for the publishing, searching, and retrieval of files by members of its

    network. So, instead of delegating content delivery to an external company (like Akamai),

    content providers can organize together to trade their (relatively cheap) local resources against

    (valuable) remote resources.

    A classical example would be the academic peer-to-peer CDN - Globule, developed by Vrije

    University in Amsterdam. It is implemented as a third-party module for the Apache HTTP

    server that allows any given server to replicate its documents to other Globule servers. This

    improves the sites performance; maintain the site available to its clients even if some servers

    14

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    22/64

    are down, and to a certain extent help to resist flash-crowds [Pierre and van Steen, 2003].

    A user participating in the Globule network is offered a distributed set of servers in which

    his/her Web content can be replicated. Globule is designed in the form of an add-on module

    for the Apache Web server. To replicate their content, content providers only need to com-

    pile an extra module into their Apache server and edit a simple configuration file. Globule

    automatically replicates the sites content and redirects clients to a nearby replica. Servers

    also monitor each others availability, so that client requests are not redirected to a failing

    replica [Halderen and Pierre, 2006; Guillaume Pierre, 2006; Pierre and van Steen, 2006b].

    S. Sivasubramanian, B Halderen and G. Pierre rightly observe that a peer-to-peer CDN

    aims to allow Web content providers to together and operate their own worldwide hosting

    platform S. Sivasubramanian and Pierre [2004] .

    2.2.3 Limitations of Existing CDN Architectures

    Despite the many advantages of commercial CDNs, they suffer from some major limitations.

    Commercial CDN providers compete each other and forced to set up costly infrastructure

    around the globe. Since they want to meet the QoS standards agreed with the clients they

    are constantly in a process of installing and updating new infrastructure. This process gives

    rise to the following issues:

    1. Network cost : Increase in total network cost in terms of new set of servers and corre-

    sponding increase in network traffic.

    2. Economic cost: Increase in cost per service rate for the distribution of Web content,

    resulting from increase in initial investment and running cost of each commercial CDN.

    3. Social cost: Content distribution is been centralized to a couple of CDN providers and

    the possible issues of monopolization of revenue in this area.

    The huge financial cost involved in setting up a commercial CDN compels the commercial

    CDN providers to charge high remuneration from their clients (the content providers). Usually

    this cost is so high that only large firms can afford it. As a result, Web content providers

    of medium and small sizes are not in a position to rent the services of commercial CDN

    providers.

    Moreover, the revenue from content distribution is monopolized. Only large CDN providers

    with huge infrastructure around the world are destined to amass revenue from this big busi-

    15

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    23/64

    ness. At the same time, the resources in terms of processing power, storage capacity and the

    network availability of large number of common Internet users are ignored who would support

    a content delivery network for proportionate remunerations.

    On the other hand, the academic CDNs are non-profitable initiatives in a peer-to-peer fashion.

    But they serve only the content providers who own their ownnetwork of servers around the

    globe. Or they have to become a part of a voluntary net of servers. However, the academic

    CDNs do not provide a built-in network of independent servers around the globe. That means,

    the risk and responsibility of running content distribution network ultimately goes back to

    the content providers themselves. The content providers, who are generally not interested in

    taking such big risks and responsibility, do not find academic CDNs as attractive alternatives.

    2.3 Distributed Content Delivery Network - An Effective Al-

    ternative

    The above discussion proves that there is a need for much reliable, responsible and scalable

    CDN architecture, which can make use of the resources of a large number of general Web

    users. A unique architecture of Distributed Content Delivery Network (DCDN) is proposed

    in this thesis to meet these ends.

    DCDN aims at involving general Web users with comparatively high bandwidth of Web

    connection (broadband or higher) to form a highly distributed content delivery network.

    Those who become the part of DCDN network are called DCDN surrogates. A cluster of

    those DCDN surrogates that are distributed very much to the local levels around the globe,

    will replace the conventional CDN server pushing the content very much near to the end-

    users. Since the content is pushed very much into the local levels, the efficiency of the content

    retrieval in terms of response time is expected to increase considerably. It will also reduce

    network traffic, since clients can access the content from locally placed surrogates. A localDCDN server, which is mainly a redirector and load balancer, is designed to redirect the

    client requests to the appropriate DCDN surrogate servers.

    Since DCDN is aimed at using the available storage space and Web connectivity of existing

    Web users, it will not demand the installation of fresh new infrastructure. This approach is

    supposed to reduce the economic cost, considerably. This acquired new value (profit pool)

    could be shared between the DCDN surrogates through proper accounting and billing mech-

    anism and through highly attractive business models. It will serve as an incentive for the

    DCDN surrogates to share their resources to support DCDN network.

    16

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    24/64

    Chapter 3

    Architecture - Distributed Content

    Delivery Network

    In order to provide a highly distributed network of DCDN surrogates a basic structure of

    commercial client-server CDN is adopted with novel peer to peer concepts. Therefore the

    DCDN architecture will be a hybrid architecture which integrates some of the major features

    of conventional client-server CDN and an academic peer-to-peer CDN.

    A single surrogate server in the conventional client-server CDN model is replaced with lightweight

    DCDN servers (which are basically redirectors) and a number of DCDN surrogates associ-

    ated with it. However, the content is distributed among the DCDN surrogate servers in a

    peer-to-peer fashion and retrieved at a client request with the help of DCDN Local servers.

    3.1 DCDN Framework

    A collection of Local DCDN Servers and innumerable DCDN Surrogates are networked to-

    gether to deliver requested Web content to the clients. The main elements of DCDN architec-

    ture Content providers, DCDN servers and DCDN surrogates are arranged in a hierarchical

    order as depicted in Figure 3.1

    Content Provider: It is that entity that request to distribute its Web content through DCDN.

    DCDN Administrators: Rather than a technical entity, it is a managerial/business entity. The

    entire DCDN network is managed, supported and run by a team of administrators. They do

    it by controlling and franchising the Master DCDN servers.

    17

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    25/64

    Surrogate

    DCDN

    Surrogate

    DCDN

    Surrogate

    DCDN

    Surrogate

    DCDN

    Surrogate

    DCDN

    Surrogate

    DCDN

    Surrogate

    DCDN

    Server

    Local

    DCDN

    Server

    Local

    DCDN

    Server

    Master

    DCDN

    Server

    Local

    DCDN

    Content

    Provider

    Figure 3.1: DCDN Content Distribution Architecture

    DCDN Servers: DCDN servers are basically redirectors that will only have the knowledge

    about the location of the content. They do not store any content as such. It may function as

    a buffer system, which help to push the content provided by the content providers to DCDN

    surrogates. They monitor, keep log of and regulate the content flow from providers to the

    surrogates.

    In the proposed architecture, DCDN servers are of two types: Master and Local.

    1. DCDN Master Servers: Master DCDN servers are the first point of contact of a contentprovider. A global network of Master DCDN servers are set up in such a way that

    every network region will have at least one Master DCDN server. Network region can

    be geographical regions like, the Americas, Europe, Asia and Asia Pacific, and Africa

    or network regions identified on the basis of a number of other criteria like, network

    traffic and network volume. Content providers deal with administrators through Master

    DCDN servers and reach terms and conditions with DCDN administrators for the service

    provided by DCDN. They monitor, regulate and control the content flow into DCDN

    servers and surrogates.

    18

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    26/64

    2. DCDN Local Servers: They are placed very near to the end-users (virtually they reside

    among the end-users). A number of Local servers can come under the service of a single

    Master server. They have got two major functions.

    Firstly, they decide where to place the content (among the surrogates) and keep log

    of it. So, Local DCDN servers will have more local and specific knowledge about a

    particular Web content. Secondly, they find out and return the IP address of the best

    available surrogate a client on request for a particular content under the care of DCDN.

    In doing so, they also function as a load balancer that will protect the surrogates in the

    network from being overloaded.

    These Local DCDN servers are networked together to form a globally distributed mas-

    sive DCDN architecture.

    The distinction between Master and Local servers refer only to the role a given server plays

    within DCDN. The same server can act both as a Master as well as a Local server, if it is

    assigned to do so.

    DCDN Surrogates: As explained before, DCDN surrogates are the large number of Web users

    who offers resources in terms of storage capacity, bandwidth and processing power to store

    and make available DCDN Web content. A requested client Web content is ultimately fetched

    from DCDN surrogates.

    DCDN Client: The client refers to an end user, who makes a request for a particular Web

    content using a Web browser. The assumption is that the client uses a standard Web browser,

    without the use of any special component such as plugins or daemons.

    3.1.1 Distribution of Content - The Process

    The aim is the place the replica of the content as close as possible to the clients. In this

    process, firstly, the content providers approach DCDN administrators. Once the Service Level

    Agreement is reached, content providers can upload their content to DCDN net. This can be

    done either through the Master DCDN servers or through the Local DCDN servers assigned

    by the Master DCDN servers. If they are uploading the content through the Master servers,

    they will push it to the Local servers. The Local servers push replicas to the surrogates

    in their own region and keep a track of these records. The Master servers will have more

    universal knowledge about it (Like, what are the network areas in which a particular content

    is distributed) and the Local servers will have more local knowledge of the location of the

    content (That is, which are the surrogates that actually holding a particular content).

    19

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    27/64

    2

    3

    1

    5

    4

    Content

    Provider

    Master

    DCDN

    Server

    Local

    DCDN

    ServerServer

    DNS

    Surrogate

    DCDN

    Suggogate

    DCDN DCDN

    Surrogate

    Client

    Figure 3.2: DCDN Content Delivery

    On request from a Local server, a surrogate may share the replicas with other surrogates in a

    peer-to-peer fashion. This will offload the Local severs from additional workload. The process

    will make sure that the Local server still has the knowledge about the replicated content in

    the new surrogate/s.

    However, the content providers need not choose to distribute their content in a true global

    manner. If they want DCDN to support only for some region(s), they can request for regional

    support too. In that case, the administrators (with the help of Master servers) choose only

    those Local DCDN servers, which are set by the parameters given in the QoS (Quality of

    Service) agreement between the content provider and DCDN administration. (For example,

    if the content is to be distributed in the Asia and Asia Pacific region, it is sent to the Local

    DCDN servers at those regions only).

    In order to keep sync with the updates and modifications, or in the event oftermination of

    serviceto a specific content provider, Master DCDN through the Local DCDN servers request

    the DCDN surrogates to update/delete the content.

    DCDN does not expect individual surrogates to host a huge volume of content, for they are

    20

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    28/64

    only general Web users with low storage capacity. Moreover, they may not be connected

    to the Web for all the time but makes themselves online for a considerable period of time,

    everyday. DCDN relay on the magnitude of storage space and bandwidth expected from the

    innumerably large number of surrogates participating in the DCDN net and their absolute

    proximity to the clients.

    Partial Replication

    Because of the unlikelihood of being online at the time of request of a specific content in

    a specific surrogate, the same content is replicated in large number of surrogates. It not

    suggested that the whole content of a Website should be stored in an individual surrogate.Partial replication of a Website is allowed because the storage space of surrogates are expected

    not to be very big. In case of partial replication, the knowledge about the remaining content

    is kept in the respective surrogate to facilitate HTTP redirection in case of query for the rest

    of the content. The content is updated, deleted or added dynamically in a regular manner,

    in sync with the Local server updates.

    The Local DCDN server assesses the demand for a particular content in a particular local-

    ity. Local DCDN server increases or decreases the number of replications within its locality

    according to this assessment. That is, if there is higher demand for a particular content in aparticular locality, the number of replicas in that locality is increased or vice versa. This will

    allow efficient content delivery service with optimum use of resources.

    3.1.2 Content Delivery to a User

    The DCDN Local server, which is envisaged as a redirector, will follow the DNS protocol. It

    will take care of the queries related to the Websites under the care of DCDN. This information

    is shared with other DNS servers too. So, when there is a request for a Website under the

    care of DCDN, the DNS redirectors will redirect it to the nearest available Local DCDN

    server. The DCDN Local server searches the log of active surrogates holding those files using

    a suitable technique (Eg. Distributed Hash Table (DHT) algorithm). It will then make a

    decision based on the other relevant parameters (availability of full or partial replica content,

    bandwidth, physical/online nearness, etc) and will return the IP addresses of the best suitable

    surrogate to the client.

    Now the client fetches the content from the respective surrogate. The participating surrogates

    will have a client daemon program running on their machines, which will handle the requests

    21

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    29/64

    ClientServer

    DNS

    Surrogate

    DCDN

    Server

    Local Master

    Server Provider

    Content

    Figure 3.3: DCDN Basic Transition Diagram

    from the clients and the parent DCDN server. If the surrogate is having only a partial content

    of the Website under request, it has to get the rest from other surrogates. The surrogate may

    use HTTP redirection to fetch the content from other surrogates.

    Diagrammatical representation of the above process is given in Figure 3.2 and the following

    interactions between different entities of DCDN are identified.

    1. Local DCDN Server - DNS Server Interaction: The Local DCDN server updates the

    DNS server with the list of content providers under DCDN care and request DNS server

    to map corresponding URL requests to the IP address of the Local DCDN server. DNS

    Server queries the Local DCDN server from time to time to update its library.

    2. Client - DNS Server Interaction: Client requests for a particular content (Website)

    under DCDN care. The DNS server directs the request to the Local DCDN server,

    using DNS protocol.

    3. Client - Local DCDN Server Interaction: Local DCDN server finds out the best pos-

    sible surrogate to cater to the request of the client and returns the IP address of that

    particular surrogate.

    4. Local DCDN Server - Surrogate Interaction: There is a constant interaction going be-

    22

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    30/64

    ClientServer

    DNS

    Surrogate

    DCDN

    Server

    Local Master

    Server Provider

    Content

    Figure 3.4: DCDN Transition Diagram - Including Contingency Plans

    tween the Local server and the surrogates. The content from the content providers are

    stored in the surrogates through the Local DCDN servers. The surrogates inform their

    availability and non-availability to the Local server as and when they become online or

    offline in terms of connectivity. Local servers keep a track of it. Local DCDN servers

    direct the surrogates to add, delete, update or modify the content according to the

    decisions made from time to time.

    5. DCDN Surrogate - Client Interaction: Once the Local DCDN server returns the IP

    address of the most suitable surrogate, the client contacts that surrogate to fetch the

    requested content. On request from the client, surrogate delivers the content to the

    client.

    The transition diagram (Figure 3.3) clubs the major two flows of interactions in DCDN,

    namely, content distribution and the content delivery. The sequence of interactions are already

    discussed in 3.1.1 and 3.1.2.

    23

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    31/64

    zone 1

    zone 2

    to the wider

    dcdn net

    Figure 3.5: Local DCDN Server Zones - Contingency Plan

    Contigency Plans

    The special design of DCDN suggests the possibility of a number of unavailable surrogates

    at any instance. So, it becomes a high priority to assess the availability of surrogates at

    every moment. Asking the surrogates to notify the Local server as and when they become

    online and offline, DCDN achieve this end. At the same time the Local servers issue ping

    commands at regular intervals to make sure the availability of surrogates, if at all they fail

    without notifying the Local server. So, the sequence diagram is modified as in Figure 3.4

    Another scenario is, when specific Web content is not available within a local DCDN Network.

    In order to cope up with this scenario, each DCDN local surrogate will be classifying the

    nearby DCDN Local servers into zones in the representative order of network proximity

    (Figure 3.5). That is, the nearby Local DCDN servers with least cost accessibility will fall in

    zone 1, and so on. When a specific content is not found in a Local DCDN net, the DCDN

    server will first search its availability in the nearby zone 1 DCDN servers. If its found, the

    request is redirected to the specific Local DCDN server. If its not found in the lower zones

    the search is extended to the higher zones, till the specific content is found.

    24

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    32/64

    3.2 DCDN Design Challenges

    In spite of all its advantages, DCDN architecture arouses its own unique set of challenges.

    The major challenges would be:

    Security

    Efficient algorithm for the effective load balancing and DNS redirection.

    Development of efficient software for quantifying the service of DCDN servers and peers.

    3.2.1 Security

    The security requirements for a DCDN service environment will be driven by the general

    security issues such as:

    1. Authentication of a content provider (who is recognized by the administrators to use the

    service of DCDN) while uploading its content to DCDN through Master/Local servers.

    2. Authentication of Master and Local DCDN server when they contact each other (for

    sharing/updating content information and so on).

    3. Authentication of Local Servers by the surrogates to authenticate pushed content.

    In addition to the above issues, maintaining integrity of the content provided by the content

    provider throughout the DCDN surrogate replicas become a crucial criteria in the business

    success of DCDN. This is because, the large number of surrogates suggest possible vulnera-

    bility of the content being manipulated by vicious surrogates or hackers. On the other hand,

    content providers will be keen to see that their original content is not tampered within the

    DCDN network.

    The DCDN daemon running on the surrogates are supposed to ensure security of the content

    stored in it. The DCDN surrogate daemon authenticates the injected content from the Local

    DCDN server and make sure that they receive original replicas. Different security measures

    can be employed to block any attack from the hackers or even from surrogate owner itself

    to access of tamper the content within the DCDN daemon. One of the solutions is to make

    sure that we track down the anomalies when the content is tampered and delivered to the

    end-users. If that can be identified, the respective surrogate can be put on alert, corrected or

    even eliminated from DCDN.

    25

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    33/64

    ClientServer

    DNS

    Surrogate

    DCDN

    Server

    Local Master

    Server Provider

    Content

    Figure 3.6: DCDN Transition Diagram - Including Security Solutions

    This can be achieved by stamping all content injected to the surrogate with a digital stamp

    like md5 or the like. The Local server will keep a record of these digital stamps. On each

    delivery of content, the surrogate daemon shall calculate the digital stamp of the delivered

    content and send it back to the Local server. The Local server compares it with its database

    and makes sure that there is no anomaly. If there an anomaly is found, content manipulation

    is identified and the Local server takes appropriate action. Verification of digital stamp for

    each and every transaction can create a huge volume of traffic between surrogates and the

    Local server. In order to moderate this traffic, this security measure can be done in some

    random basis.

    The final transition diagram incorporating the contingency and security issues is shown in

    Figure 3.6. Furhter discussions about the security of DCDN architecture are out of the scope

    of this minor thesis.

    3.2.2 Effective Redirection and Load-balancing Algorithm

    The key to the success of DCDN would rely on the success of an effective redirection algorithm.

    The DCDN will be having multiple replications of the same content within a local DCDN

    set up to ensure scalability of the system. This replication may exponentially increase as

    26

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    34/64

    the number of local DCDN networks increase throughout the globe. A combination of NDS

    HTTP address redirection system as mentioned in 3.1.2 has to be a developed as a possible

    solution in this regard.

    The DCDN server has to distribute the load within a local system. It should also take care

    of the availability or non-availability of p eer nodes. If the requested content is not within the

    local DCDN system, DCDN server should be able to make the right decision to get it from

    the other local DCDN systems without causing network congestion. Effective load-balancing

    algorithms have to be developed in this regard. Based on the results of queuing delay analysis,

    a basic algorithm for DCDN servers is proposed in the next chapter.

    3.2.3 Billing and SLA (Service Level Agreement) Verification Software

    DCDN has to provide content providers with accounting and access-related information. This

    information has to be provided in the form of aggregate or detailed log files. In addition,

    DCDN should collect accounting information to aid in operation, billing and SLA verification.

    The DCDN Master surrogates deal with these content provider related issues.

    At the same time, DCDN has to quantify proper remuneration for surrogates according to

    their availability, performance, storage space, etc. There is a need for generalized systems or

    protocols in the calculation of the contributions of surrogates and a local DCDN servers, on

    the basis of the business model adopted for DCDN.

    3.3 Business Model

    The success of DCDN architecture depends upon building up a global DCDN tree consists

    of Major/Local DCDN servers and considerably large number of DCDN surrogates. There

    should be strong incentive for individuals to become a part of DCDN tree. The incentive is

    the shared monetary benefit from the bonus pot, which is filled with the money saved by not

    paying to the middlemen, that is, the commercial CDNs. According to their share of service -

    the online availability, storage, bandwidth, processing power and other relevant factors - the

    surrogates are to be offered proportionate remuneration.

    A possible business model for DCDN could be that of Network Marketing/ Multilevel mar-

    keting which is based on pyramid scheme.

    27

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    35/64

    1

    10100

    1,000

    10,000

    100,000

    1,000,000

    10,000,000

    100,000,000

    1,000,000,000

    10,000,000,000

    Figure 3.7: Pyramid Scheme

    3.3.1 Network Marketing (NM)/ Multi Level Marketing (MLM)

    Wikipedia defines it as follows: Multi-level marketing (MLM) (also called network marketing

    or NM) is a business model that combines direct marketing with franchising [Wikipedia, b].

    In a typical multi-level marketing or network marketing arrangement, individuals associate

    with a parent company as an independent contractor or franchisee and are compensated based

    on their sales of products or service, as well as the sales achieved by those they bring into

    the business. This is like many franchise companies where royalties are paid from the sales of

    individual franchise operations to the franchisor as well as to an area or region manager.

    MLM is inspired by the mathematical model of Pyramid scheme. If a pyramid were started

    by a human being at the top with just 10 people beneath him, and 100 beneath them, and

    1000 beneath them, etc., the pyramid would involve everyone on earth in just ten layers of

    people with a single man on top. The human pyramid would be about 60 feet high and the

    bottom layer would have more than 4.5 billion people [Skeptic Dictonary, 2007]. Figure 3.7

    will help us to see this:

    This scheme is effectively used by MLM giants such as Amway, Big Planet, Excel communi-

    cations, Mary Kay, etc [Wikipedia, a]. A general business model of NM/MLM Distributor

    hierarchy (Figure 3.81), which resembles the DCDN hierarchy, shows the scope of adopting

    NM/MLM model for the effective creation of DCDN tree of surrogates.

    In the DCDN model, the Distributor will be replaced with the content provider, the first level

    will be the net of Master DCDN servers and the second level will be of Local DCDN servers.

    1Ref: http://www.mlmknowhow.com/articles/startup/getpaid.htm

    28

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    36/64

    Figure 3.8: MLM Architecture

    In other words, the DCDN administers will be franchising the concept to the Master/Local

    server levels. They in turn recruit the final level of hierarchy - the surrogates - to store the

    content in local levels. According to the expansion needs, more and more levels could be

    envisaged in the long run. This can be achieved by adding different layers of Master DCDN

    servers in different hierarchical levels.

    Eventually, an active DCDN server develops a hierarchical substructure known as a down-

    line, that looks like an organization chart in a company with a lot of employees. Each DCDN

    server gets commission/remuneration on the service of surrogates in their down-line. There

    are also likely to be performance bonuses available for reaching certain service levels. The

    profit earned from the commission over its surrogates become the driving force for the DCDN

    servers (Master/Local) to maintain their technological infrastructure (both hardware and

    software) and to add more and more surrogates to their hierarchical structure. This will finally

    improve the scalability and the efficiency of DCDN network.With this kind of business model,

    there are no big capital requirements, no geographical limitations and no special education

    or skills needed for its participants. Since, the revenue collected from the content providers

    are proportionately shared among the surrogates it can become a low-overhead, home-based

    business for the participating surrogates. Network marketing is a people-to-people business,

    which goes very well with the idea of near peer-to-peer architecture of DCDN.

    29

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    37/64

    3.3.2 Special Scenarios of DCDN Advantage

    The new architectural model suggested for DCDN and the corresponding MLM business modelwill open up whole new possibilities in content distribution. DCDN architecture is supposed

    to be more effective in distributing static content than that of dynamic content. The most

    important beneficiaries of DCDN will be the popular streaming media sharing websites like

    youtube.comand photo-sharing websites (e.g. Picasa Web Albums, Orkutpictures) who has

    to support millions of media files uploaded throughout the world and to effectively deliver

    to the end-users in a more distributive manner. The popular music sharing services will also

    find DCDN as an effective and cheaper means of delivering their services to the customers.

    30

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    38/64

    Chapter 4

    Performance Analysis and Load

    Balancing Algorithm

    The performance of DCDN architecture can be expressed in terms of total delay in retrieving

    a Web content. Here, the DCDN surrogates are expected to be the bottlenecks for they are

    the common Internet users with limited resources. So, we can say that the success of DCDN

    architecture will depend upon the performance of DCDN surrogates.

    This chapter analyzes the performance of a DCDN surrogate using queuing theory techniques.

    On the basis of this analysis a load balancing algorithm for DCDN server is suggested.

    4.1 Performance Parameters and Assumptions

    Total delay in retrieving a content is the sum of propagation delay, processing delay at DCDN

    surrogates and the transmission delay.

    Transmission delay is the time required by a DCDN surrogate to transmit all data packets ofthe requested content onto the transmission link. In our case, It is directly proportional to

    the available bandwidth of DCDN surrogate. Once the packets are pushed onto the link, they

    need to be propagated to the client. The time taken for propagation is called propagation

    delay. Propagation delay is taken out of consideration in analyzing the performance of DCDN

    architecture. This is because, it assumes replication of content much near to the clients than

    that of conventional architectures. The processing speed of surrogates is assumed to be so

    high that processing delay is negligible as compared to transmission delay.

    31

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    39/64

    In nutshell, the efficiency of DCDN network can be expressed in terms of transmission delay at

    DCDN surrogate. This is termed as total system delay at the surrogates. In order to ensure

    better performance a truncated buffer system is suggested at DCDN surrogates. Also we

    assume a poisson process of randomly spaced requests in time and an exponential distribution

    of service-time. It will result in M/M/c/k model of queuing analysis, where c is number of

    servers or server daemon programs engaged and k is the total buffer size.

    4.2 Queuing Metrics

    Total system delay in a DCDN surrogate for different M/M/c/k models and corresponding

    rejection rates are found out using the following formulas as explained by D. Gross and C.

    M. Harris [Gross and Harris, 1998]. We assume to replicate the effect of multiple servers in a

    single surrogate by running more than one DCDN surrogate daemons (as multiple processes

    or threads) within the same machine.

    Service Time (S): In our case, it is the transmission time, which is equal to;

    S=

    F ilesizeUpstreamCapacity

    Therefore, Service rate of the DCDN surrogate = ( 1S)

    Server Utilization (): = (

    ) for M/M/1/k and

    = ( c

    ) for M/M/c/k queuing model

    where is the arrival rate of requests to DCDN surrogate.

    Effective arrival ratee: e= (1 Pk) wherePkis the probability ofkrequests in the

    system.

    Probablity of zero requests in the systemP0:

    P0 =c1i=0

    (n/)i

    i! +(/)

    c

    c!1

    kc+1

    1

    1

    Probablity ofn customers in system for 0 n c Pn =

    ()n

    n!

    P0

    Probablity ofn customers in system for c n k Pn =

    ()n

    c!cnc

    P0

    Average number of requests in the queueLq:

    Lq =kn=c(n c)Pn

    32

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    40/64

    Average number of requests in the systemL:

    L= Lq+

    (1 Pk)

    Average System DelayW: Average System Delay is the time duration a request has to

    wait from the moment it enters a server queue, till it is served by any of the servers

    available to take a request.

    for M/M/c/k, W = Le

    Rejection Rate: The number of requests that will be lost due to congestion per unit

    time is given by: Pk

    If the mean arrival rate of requests in greater than the service rate of surrogates, it will choke

    the surrogates. In order to avoid this scenario, the mean arrival rate of requests () is to

    be kept less than the service rate of surrogates. In other words, server utilization () is kept

    below one in all queuing models.

    At the same time, we have to be cautious about the probability of blocking (loss) of requests.

    Since we cannot afford the loss of requests beyond a very minimum level, rejection rate of

    requests for different models is to be taken into account in the design of a load balancing

    algorithm for DCDN server.

    4.3 Queuing Theory Modeling for Different Scenarios

    Different queuing theory models are analyzed for different cases to find out average system

    delay and rejection rate as described in the previous section. The following assumptions are

    made to analyze the queuing parameters.

    1. The surrogates are supposed to have a minimum of DLS/Cable Web access.

    2. Minimum capacity of DLS/Cable line is rated at 768 Kbps downstream and 128 Kbps

    upstream.

    Using a Web Page Analyzer1 it is found that the average size of Web pages of medium size

    content providers (example: www.rajagiritech.ac.in) is about 30 KB. However, the upstream

    capacity of surrogates will not be same for all surrogates in real-time scenario. We can

    reasonably assume that there will also be some surrogates with higher level of connectivity

    1Available at: http://www.websiteoptimization.com/services/analyze/

    33

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    41/64

    20 40 60 80

    Utilization in Percentage(Access Rate/Service Rate)

    0

    2

    4

    6

    AverageDelayinDCDNS

    urrogate(sec)

    M/M/1/1

    M/M/1/2

    M/M/1/3

    M/M/2/2

    M/M/2/3

    M/M/3/3

    20 40 60 80

    Utilization in Percentage(Access Rate/Service Rate)

    0

    2

    4

    6

    AverageDelayinDCDNS

    urrogate(sec)

    M/M/1/1

    M/M/1/2

    M/M/1/3

    M/M/2/2

    M/M/2/3

    M/M/3/3

    (with 128 Kbps capacity) (with 256 Kbps capacity )

    Figure 4.1: Utilization v/s Total System Delay

    who would be able to give a better performance. In order to reflect this scenario, we have

    also analyzed the queuing delay for a doubled service rate of DCDN surrogates. That means,

    the upstream capacity of surrogates is raised from 128 Kbps to 256 Kbps.

    A surrogate originally intending to serve a single request at a time may actually end up serving

    2 or 3 in a real-time scenario. This may happen if the multiple requests for a particular content

    is only available with a single surrogate. In that case, the surrogate is supposed to serve those

    requests with reduced service rates, i.e., for M/M/1/1 the service rate is ; for M/M/2/2 it

    is /2, and for M/M/3/3 the rate is /3.

    The values are found using QtsPlus, a queuing theory analysis software provided by D. Gross

    and C. M. Harris2 [Gross and Harris, 1998]. The results of these analysis are compiled in

    Figure 4.1 for 128 Kbps and 256 Kbps upstream capacity. The Rejection rate of different

    queuing models are presented in Figure 4.2.

    4.4 Load Balancing Algorithm for DCDN Servers

    Many load-balancing algorithms have been proposed in the past to ensure scalable Web

    servers [Bryhni et al., 2000; Godfrey et al., 2004; Aweya et al., 2002; Wolf and Yu, 2001;

    Chen et al., 2005]. The stateless property of HTTP protocol by which requests can be routed

    2Available at: http://www.geocities.com/qtsplus/

    34

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    42/64

    20 40 60 80

    Utilization in Percentage(Access Rate/Service Rate)

    0

    1

    2

    3

    4

    Lossofrequestsperunittim

    e

    M/M/1/1

    M/M/1/2

    M/M/1/3

    M/M/2/2

    M/M/2/3

    M/M/3/3

    Figure 4.2: Utilization v/s Rejection Rate

    separately to different servers is widely used to achieve load sharing in a cluster of Web

    servers [Bryhni et al., 2000]. The canonical name (CNAME) associated with a Web link can

    be mapped to the IP addresses of a number of replicated servers, who hold the same content.Bryhni et al. [2000] suggest that this mapping can be done at the network to achieve best

    performance. Same techniques can be adopted for DCDN but by customizing it for its highly

    distributed nature.

    An algorithm for load-balancing in highly heterogeneous and dynamic P2P environment is

    suggested by Godfrey et al. [2004]. They uses the concept ofvirtual serverwhere a physical

    node hosts one or more virtual servers. The load balancing is done by moving virtual servers

    from heavily loaded physical nodes to lightly loaded physical servers. But it is proposed on the

    assumption that load balancer has got very little control over where the objects are stored.But in DCDN environment, DCDN server has got more control over the content within

    its surrogates. Moreover, the load balancing algorithms in P2P systems, generally do not

    consider the difference in capacity of its peers. In DCDN we can not discard this difference as

    we want to offer as efficient service as that of a commercial CDN. The formulation of a simple

    but efficient load balancing algorithm to ensure almost equal server load to the surrogates

    by making use of the information and control residing with the DCDN server becomes an

    inevitability.

    By carefully analyzing the Utilization v/s Total System Delay graphs (Figure 4.1) and Uti-

    35

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    43/64

    lization v/s Rejection Rate graph (Figure 4.2) in the previous section, the following inferences

    can be made.

    1. The best performance is expected from DCDN, when surrogates follow M/M/c/c queu-

    ing models.

    2. The reduction in average delay time is almost directly proportional to increase in upload

    capacity, in all the scenarios.

    3. Loss of requests can be reduced by increasing the number of requests in the whole

    system by increasing the value ofk in M/M/c/k queuing model.

    Based on the above inferences we can suggest a load balancing algorithm for the DCDN severs.

    An algorithm based on M/M/c/c model is expected to be more scalable and comparatively

    higher efficient than other models. This reflection is made by considering an optimum balance

    between total system delay and rejection rate. The real time scenario also suggests that there

    may be cases where multiple content have to be served to different clients simultaneously

    from a single surrogate. In the light of above discussion, we make the assumption that the

    surrogates will be designed to support M/M/c/c queuing model of request streams where

    the value of c will be proportional to the processing capacity of surrogates. The following

    optimum server loadalgorithm for effective load balancing is proposed to ensure reasonable

    load sharing between the surrogates.

    Load Balancing Algorithm for DCDN Server1: let, DCDN server has the knowledge of the service rate () of its surrogates;

    2: let, DCDN server is aware of the requests send to () its surrogates;

    3: let, DCDN surrogates support only M/M/c/c queuing models;

    4: c is the maximum number of requests allowed in a particular surrogate;

    5: Web requests arrive at the Local DCDN Server;

    6: if requested content is available in the Local DCDN surrogate network then

    7: ifthere are surrogates with P0 (Probability of NO requests) equal to 100 (i.e., idle surrogates)then

    8: send request to the surrogate with highest c value ( i.e., to surrogate with highest service

    capacity);

    9: else

    10: while search do not exceed the Max Trial Number do

    11: find the surrogate with lowest Server Utilization

    (= ( c

    ));

    12: end while

    13: send request to that surrogate;

    36

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    44/64

    14: end if

    15: else

    16: redirect request to other Local DCDN server who has the requested content;17: end if

    We expect that this algorithm will distribute the workload reasonably well between the surro-

    gates. However, this can only be validated by conducting extensive simulations which repro-

    duce the highly distributed DCDN environment. The next chapter provides those simulations

    and its results.

    37

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    45/64

    Chapter 5

    Simulations and Results

    In the previous chapter, major two matrices of interest, namely queuing delay and rejection

    rate at the DCDN surrogates were discussed. The results were compiled in the form of

    graphs. On the basis of those results, a probable load balancing algorithm for DCDN servers

    was suggested.

    Various scenarios are created using simulation tool to replicate the DCDN as well as the

    commercial client-server CDN architecture. Simulations are conducted to compare the per-

    formance of DCDN architecture with the client-server CDN architecture using optimum server

    load- load balancing algorithm.

    The simulations are conducted using Opnet IT Guru network simulator. The main reason

    to use Opnet IT Guru is its user-friendliness in picking the predefined models and objects

    using drag and drop functionality. The Opnet predefined model and objects are validated

    and hence require no further validation. The devices, links and nodes in Opnet IT Guru are

    using reasonable assumptions and enable us to have a strong data analysis

    This chapter presents the goals, assumptions and the setup of simulations. The performance

    comparison between different scenarios of DCDN and commercial CDN architectures using

    optimum server load - load balancing algorithm is further discussed.

    5.1 Goals

    The objective of the simulations is to evaluate the feasibility of DCDN architecture. That

    is to check the performance of DCDN architecture in comparison with that of commercial

    client-server CDN architecture in terms of page response time, utilization of DCDN server

    38

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    46/64

    (load balancer, in case of conventional CDN) and utilization of DCDN surrogate (CDN server,

    in case of conventional CDN). The simulation scenarios were designed to achieve the following

    goals:

    The simulations should be able to provide some reasonable data to show that the DCDN

    architecture will be able to give better or at least the same performance of the commer-

    cial client-server CDN architecture.

    The technologies and the protocols used in the simulation environment should reproduce

    the standard protocols used in the industry.

    The simulation should allow the addition, deletion and modification of the clients,DCDN servers (load balancers) and the surrogates (servers) for easy comparison of

    different parameters used for the evaluation.

    5.2 Assumptions

    The simulations are designed to simulate a commercial client-server CDN environment in the

    first place and then to simulate the DCDN setup. The following assumptions are made to

    create those environments:

    DCDN server lies within the IP cloud unlike in the case of commercial CDN (where

    it is the load balancer of CDN server farm). The use of an additional IP cloud be-

    tween DCDN server and DCDN surrogates is assumed to represent this environment

    (Figure D.2).

    The simulations are conducted in a standard PC and the results are expected to be only

    suggestive. However, we assume that similar scenarios of commercial CDN and DCDN

    setup are comparable since both are conducted at similar environments.

    The data obtained from the simulations can be scaled with an appropriate value so as

    to have a reasonable approximation of the parameters assessed.

    5.3 Overview of Simulation Setup

    The experiment was conducted by choosing a standard commercial CDN setup serving 150

    clients. Performance of the setup was found in terms of page response time, load balancer

    39

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    47/64

    Commercial

    CDN

    DCDN:

    Scenario 1

    DCDN:

    Scenario 2

    DCDN:

    Scenario 3

    Number of Clients 150 150 75 30

    Number of Surro-

    gates (or Servers)

    3 6 6 6

    Link Capacity

    (Mbps)

    100 10 10 10

    Load Balancing Algo-

    rithm

    round

    robin

    server load server load server load

    Table 5.1: Simulation Setup

    utilization and server utilization. The environment was reset to represent DCDN architecture

    and the above performance parameters were found again. The experiment was repeated until

    the DCDN setup could replicate the performace of commercial CDN setup, by altering the

    critical parameters of simulation enviornment . The critical parameters that defined the

    different simulation scenarios were:

    Number of Clients: The clients were all HTTP clients with requests of medium file size.

    The number of requests in the system is directly proportional to the number of clients.

    Number of Surrogates(or Servers): The ethernet servers in the commercial CDN setup

    were replaced with larger number of ethernet work-stations as surrogates.

    Link Capacity: The link capacity of DCDN surrogates are kept significantly lower than

    the commercial CDN setup to reflect the DCDN architecture in simulation.

    The four scenarios created by altering the above parameters for the simulation purpose are

    given below:

    Commercial CDN: It is the standard CDN setup with 150 medium HTTP clients. They were

    served by a server farm of 3 CDN servers. The link capacity of the CDN servers was set to

    100 Mbps to reflect the fact that commercial CDN can afford more resources. Round robin

    algorithm was used for load balancing.

    DCDN - Scenario 1: The scenario is changed to DCDN setup serving same number of (150)

    medium HTTP clients. The three CDN servers in the previous setup was replaced with six

    DCDN surrogates(work-stations). DCDN surrogate link capacities were reduced to 10 Mbps

    40

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    48/64

    to simulate the fact that they will have lower link capacity than of commercial CDN servers.

    Optimum server load algorithm is used for load balancing.

    DCDN - Scenario 2: In this DCDN setup, the number of clients was reduced to 75 by keeping

    all other parameters the same as the previous setup. Optimum server load algorithm is used

    for load balancing.

    DCDN - Scenario 3: The number of clients was further reduced to 30 by keeping all other

    parameters intact in this DCDN setup. Optimum server load algorithm is used for load

    balancing.

    5.4 Simulation Results

    The simulations were conducted using the setups described in the previous section. The

    number of clients, surrogates (or servers) and the link capacity of the surrogates used for the

    simulations are given in Table 5.1. Different number of clients in different cases produced

    different number of requests that were handled by the DCDN surrogates (or servers in the

    case of commercial CDN).

    The simulations were run long enough to achieve a steady system state. The page response

    time, surrogate (or server) utilization and load balancer utilization are recorded for each case.

    The simulation results and its implications are explained in the subsequent sections.

    5.4.1 Page Response Time

    Page response time is the interval between the instance at which an end-user at a terminal

    enters a request for Website and the instance at which the Webpage is received at the terminal.

    This parameter is very critical in our comparison of DCDN with commercial CDN, for it is

    the most visible experience of the end-user regarding the performance of a CDN. The averagepage response time obtained during the simulations are compiled in graph 5.1.

    Commercial CDN architecture produce an excellent result for 150 clients using a server farm

    of 3 servers connected using a hub. It proves the fact that commercial CDN provides a better

    service for the end-users.

    The DCDN scenario 1 where the same number of clients were allowed to fetch content from

    six surrogate work-stations (double the number of servers in the previous case) has produced

    an average page response time of around 15 seconds. It is compared to the less than 2 second

    41

  • 8/14/2019 An Architecture for Distributed Content Delivery Networks.pdf

    49/64

    500 1000 1500

    Simulation Time (Sec)

    5

    10

    15

    AveragePageResponseTime(Sec)

    Commercial CDN

    DCDN - Scenario 1

    DCDN - Scenario 2

    DCDN - Scenario 3

    Figure 5.1: Page Response Time

    average page response time of the similar commercial CDN setup. It shows that DCDN

    architecture with similar setup of commercial CDN is inefficient. Though, the result seems

    to be discouraging, it was perfectly in line with our early assumptions. Since the powerfulservers in the case of commercial CDN is replaced with work-stations and the link capacity

    of the surrogates were reduced to 1/10 of that of CDN setup, the efficiency of system was

    bound to be reduced considerably.

    The aim of the experiment was to find out whether DCDN could replicate the performance

    of commercial CDN in any scenario.

    Because of the limitations of the Academic Version of - Opnet IT Guru, we could not increase

    the number of surrogates. So, we reduced the number of clients to half, namely 75, in DCDN

    scenario 2. By decreasing the number of clients we are decreasing the volume of requests.

    The result was promising. There was a considerable improvement in the page response time.

    It was reduced to nearly half. But it was still above the commercial CDN performance.

    The number of clients was further reduced to 30 in DCDN scenario 3. The graph shows us

    that DC