2010-Next generation Cloud Computing Architecture

Embed Size (px)

Citation preview

  • 8/7/2019 2010-Next generation Cloud Computing Architecture

    1/6

    Next generation Cloud Computing ArchitectureEnabling real-time dynamism for shared distributed physical infrastructure

    Vijay Sarathy, Purnendu Narayan and Rao Mikkilineni, Ph DKawa Objects, Inc.

    Los Altos, CA

    [email protected], [email protected], [email protected]

    Abstract Cloud computing is fundamentally altering the

    expectations for how and when computing, storage and

    networking resources should be allocated, managed and

    consumed. End-users are increasingly sensitive to the latency of

    services they consume. Service Developers want the Service

    Providers to ensure or provide the capability to dynamically

    allocate and manage resources in response to changing demand

    patterns in real-time. Ultimately, Service Providers are under

    pressure to architect their infrastructure to enable real-time end-to-end visibility and dynamic resource management with fine-

    grained control to reduce total cost of ownership while also

    improving agility.

    The current approaches to enabling real-time, dynamic

    infrastructure are inadequate, expensive and not scalable to

    support consumer mass-market requirements. Over time, the

    server-centric infrastructure management systems have evolved

    to become a complex tangle of layered systems designed to

    automate systems administration functions that are knowledge

    and labor intensive. This expensive and non-real time paradigm

    is ill suited for a world where customers are demanding

    communication, collaboration and commerce at the speed of

    light. Thanks to hardware assisted virtualization, and the

    resulting decoupling of infrastructure and application

    management, it is now possible to provide dynamic visibility and

    control of services management to meet the rapidly growing

    demand for cloud-based services.

    What is needed is a rethinking of the underlying operating

    system and management infrastructure to accommodate the

    ongoing transformation of the data center from the traditional

    server-centric architecture model to a cloud or network-centric

    model. This paper proposes and describes a reference model for a

    network-centric datacenter infrastructure management stack

    that borrows and applies key concepts that have enabled

    dynamism, scalability, reliability and security in the telecom

    industry, to the computing industry. Finally, the paper will

    describe a proof-of-concept system that was implemented to

    demonstrate how dynamic resource management can be

    implemented to enable real-time service assurance for network-centric datacenter architecture.

    Keywords-Cloud Computing, Distributed Computing,

    Virtualization, Data Center

    I. INTRODUCTIONThe unpredictable demands of the Web 2.0 era in

    combination with the desire to better utilize IT resources aredriving the need for a more dynamic IT infrastructure that canrespond to rapidly changing requirements in real-time. This

    need for real-time dynamism is about to fundamentally alter thedatacenter landscape and transform the IT infrastructure as weknow it [1].

    Figure 1: Transformation of the Traditional Datacenter

    In the cloud computing era, the computer can no longer bethought of in terms of the physical enclosure i.e. the server orbox, which houses the processor, memory, storage andassociated components that constitute the computer. Instead thecomputer in the cloud ideally comprises a pool of physicalcompute resources i.e. processors, memory, networkbandwidth and storage, potentially distributed physically across

    server and geographical boundaries which can be organized ondemand into a dynamic logical entity i.e. a cloud computer,that can grow or shrink in real-time in order to assure thedesired levels of latency sensitivity, performance, scalability,reliability and security to any application that runs in it. Whatis truly enabling this transformation today is virtualizationtechnology more specifically hardware assisted servervirtualization.

    At a fundamental level, virtualization technology enables theabstraction or decoupling of the application payload from theunderlying physical resource [2]. What this typically means isthat the physical resource can then be carved up into logical orvirtual resources as needed. This is known as provisioning. Byintroducing a suitable management infrastructure on top of this

    virtualization functionality, the provisioning of these logicalresources could be made dynamic i.e. the logical resourcecould be made bigger or smaller in accordance with demand.This is known as dynamic provisioning. To enable a truecloud computer, every single computing element or resourceshould be capable of being dynamically provisioned andmanaged in real-time. Presently, there are many holes and areasfor improvement in todays datacenter infrastructure before wecan achieve the above vision of a cloud computer. Below wediscuss these for each of the key datacenter infrastructurecomponents.

    2010 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises

    978-0-7695-4063-4/10 $26.00 2010 IEEE

    DOI 10.1109/WETICE.2010.14

    48

  • 8/7/2019 2010-Next generation Cloud Computing Architecture

    2/6

    A. Server Operating Systems and VirtualizationWhereas networks and storage resources - thanks to

    advances in network services management and SANs, havealready been capable of being virtualized for a while, only now

    with the wider adoption of server virtualization do we have thecomplete basic foundation for cloud computing i.e. all

    computing resources can now be virtualized. Consequently,server virtualization is the spark that is now driving the

    transformation of the IT infrastructure from the traditionalserver-centric computing architecture to a network-centric,cloud computing architecture. With server virtualization, we

    now have the ability to create complete logical (virtual) serversthat are independent of the underlying physical infrastructure or

    their physical location. We can specify the computing, networkand storage resources for each logical server (virtual machine)and even move workloads from one virtual machine to anotherin real-time (live migration). All of this has helped to radically

    transform the cost structure and efficiency of the datacenter.Capacity utilization of servers can be increased and overall

    power consumption can be dramatically reduced by

    consolidating workloads. Additionally, thanks to servervirtualization and live migration, High Availability (HA) andDisaster Recovery (DR) can be implemented much more

    efficiently [3]. Despite the numerous benefits that virtualizationhas enabled we are yet to realize the full potential of

    virtualization in terms of cloud computing. This is because:

    Traditional server-centric operating systems were notdesigned to manage shared distributed resources: TheCloud computing paradigm is all about optimallysharing a set of distributed computing resourceswhereas the server-centric computing paradigm isabout dedicating resources to a particular application.The server-centric paradigm of computing inherently

    ties the application to the server. The job of the serveroperating system is to dedicate and ensure availabilityof all available computing resources on the server tothe application. If another application is installed onthe same server, the operating system will once againmanage all of the server resources, to ensure that eachapplication continues to be serviced as if it has accessto all available resources on that server. This modelwas not designed to allow for the dial-up or dial-down of resource allocated to an application inresponse to changing workload demands or businesspriorities. This is why load-balancing and clusteringwas introduced. However, that does not alter theassociation of an application to a server. It just usesmore instances of the application each running in

    their own server, to try and share any increased burden.

    What is required for cloud computing - wheredistributed resources are shared amongst applications,is for a way to mediate between the applications andthe resources by prioritizing the applications needsbased on relative business priorities. Our keyobservation here is that any sharing of resources will atsome point inevitably result in contention for those

    resources which can only be resolved through a systemthat performs mediation globally across all thedistributed shared resources. Todays operatingsystems do not natively provide this type of capability.This is often relegated to management systems that arelayered on top or orthogonal to operating systems.However, the management system were also designed

    for a server-centric, configuration-based paradigm andhave similar issues which make them ill suited asmediators that can enable real-time dynamism. Theissues related to management systems are detailed in aseparate section below. Finally, implementing simpleserver level QoS within the local server operatingsystems is not the answer as it does not help resolvecontention amongst shared or distributed resourcesglobally.

    Current hypervisors do not provide adequateseparation between application management andphysical resource management: Todays hypervisorshave just interposed themselves one level down belowthe operating system to enable multiple virtual

    servers to be hosted on one physical server [4, 5].While this is great for consolidation, once again thereis no way for applications to manage how, what andwhen resources are allocated to themselves withouthaving to worry about the management of physicalresources. It is our observation that the currentgeneration of hypervisors which were also born fromthe era of server-centric computing does not delineatehardware management from application managementmuch like the server operating systems themselves. Itis our contention that management and allocation of ashared infrastructure require a different approach. In anenvironment where resources are being shared theallocation of resources to specific applications musttake into account the applications resource usage

    profile and business priority relative to otherapplications also using resources. In the ideal situationthe profiles of resources are mediated to match theneeds of applications based on their usage profiles andbusiness priorities at run time.

    Server virtualization does not yet enable sharing ofdistributed resources: Server virtualization presentlyallows a single physical server to be organized intomultiple logical servers. However, there is no way forexample to create a logical or virtual server fromresources that may be physically located in separateservers. It is true that by virtue of the live migrationcapabilities that server virtualization technologyenables, we are able to move application workloads

    from one physical server to another potentially evengeographically distant physical server. However,moving is not the same as sharing. It is our contentionthat to enable a truly distributed cloud computer, wemust be able to efficiently share resources no matterwhere they reside purely based on the latencyconstraints of applications or services that consume theresources . Present day hypervisors do not exploit thepotential of sharing distributed resources across

    49

  • 8/7/2019 2010-Next generation Cloud Computing Architecture

    3/6

    physical and geographic boundaries and providelatency based composition of logical servers to fullyutilize the power of distributed resources.

    B. Storage Networks & VirtualizationBefore the proliferation of server virtualization, storage

    networking and storage virtualization enabled many

    improvements in the datacenter. The key driver was theintroduction of the Fibre Channel (FC) protocol and FibreChannel-based Storage Area Networks (SAN) which providedhigh speed storage connectivity and specialized storagesolutions to enable such benefits as server-less backup, point topoint replication, HA/DR and performance optimizationoutside of the servers that run applications. However, thesebenefits have come with increased management complexityand costs. In fact SAN administrator costs are often cited as thesingle most critical factor affecting the successful deploymentand management of virtual server infrastructure [6].

    C. Network VirtualizationThe virtual networks now implemented inside the physical

    server to switch between all the virtual servers provide an

    alternative to the multiplexed, multi-pathed network channelsby trunking them directly to WAN transport therebysimplifying the physical network infrastructure. With theproliferation of multi-core multi-CPU commodity servers, ithas almost become necessary to eliminate the mess of cablesotherwise needed to interface multiple HBAs and NICs foreach application with a single high speed Ethernet connectionand a virtual switch. It is our contention that resultantarchitectural simplicity will significantly reduce associatedmanagement burden and costs.

    D. Systems Management InfrastructurePresent day management systems are not cut out to enable

    the real-time dynamic infrastructure needed for cloudcomputing [7]. Here are the reasons why:

    Human system administrators do not lend themselvesto enabling real-time dynamism: Once again, evolutionof present day management systems can be traced backto their origin as management tools designed to helphuman system administrators who managed serversand other IT infrastructure. Even today, the systemshave to be configured manually by a human withexpert-level knowledge of the various infrastructurepieces and provide minimal automation. The systemsoften require human intervention to respond to alerts orevents resulting in a significant amount of humanlatency which just does not lend itself to enabling thereal-time dynamism required by cloud computing.

    Policy-based management is not really automation:There are a lot of management systems today thatprovide policy-based management capabilities. Thetrouble is the policies have to be programmed byexpert system administrators who make judgmentsbased on their experience. This is neither an optimalnor a scalable solution for cloud computingenvironments where the workload demands areunprecedented and vary wildly.

    Virtualization compounds management complexity:Every practitioner of server virtualization is aware ofhow virtualization can result in Virtual Machine (VM)Sprawl and the associated management burden itcreates. VM Sprawl is a result of the ease with whichnew VMs can be created and proliferated onvirtualized servers. This is however not the only factor

    affecting management complexity. Since VMs arecheaper to setup and run than a physical server, loadbalancers, routers and other applications that requiredphysical servers are all now being run as in VMswithin a physical server. Consequently, we now haveto manage and route network traffic resulting from allthese VMs within a server as well as the networktraffic being routed across server. Adding to thisconfusion are the various OS vendor, each offering theother vendors OS as a guest without providing thesame level integration services. This makes the reallife implementations, very management intensive,cumbersome and error-prone to really operate in aheterogeneous environment. It is our contention thatthe cost of this additional management may yet offset

    any cost benefits that virtualization has enabledthrough consolidation. A traditional managementsystem with human-dependency is just an untenablesolution for cloud computing.

    E. Application Creation and PackagingThe current method of using Virtual Machine images that

    include the application, OS and storage disk images is onceagain born of a server-centric computing paradigm and doesnot lend itself to enable distribution across shared resources. Ina cloud computing paradigm, applications should ideally beconstructed as a collection of services which can be composed,decomposed and distributed on the fly. Each of the servicescould be considered to be individual processes of a largerworkflow that constitutes the application. In this way,individual services can be orchestrated and provisioned tooptimize the overall performance and latency requirements forthe application.

    II. PROPOSED REFERENCE ARCHITECTURE MODELIf we were to distill the above observations from the

    previous section, we can see a couple of key themes emerging.That is:

    The next generation architecture for cloud computingmust completely decouple physical resourcesmanagement from virtual resource management; and

    Provide the capability to mediate between applicationsand resources in real-time.

    As we highlighted in the previous section, we are yet toachieve perfect decoupling of physical resources managementfrom virtual resource management but the introduction andincreased adoption of hardware assisted virtualization (HAV)as an important and necessary step towards this goal. Thanks toHAV, a next generation hypervisor will be able to manage andtruly ensure the same level of access to the underlying physicalresources. Additionally, this hypervisor should be capable ofmanaging both the resources located locally within a server as

    50

  • 8/7/2019 2010-Next generation Cloud Computing Architecture

    4/6

    well as any resources in other servers that may be locatedelsewhere physically and connected by a network.

    Once the management of physical resources is decoupledfrom the virtual resource management the need for a mediationlayer that arbitrates the allocation of resources betweenmultiple applications and the shared distributed physicalresources becomes apparent.

    Figure 2: Reference Architecture Model for Next Generation CloudComputing Infrastructure

    Infrastructure Service Fabric: This layer comprisestwo pieces. Together the two components enable acomputing resource dial-tone that provides the basisfor provisioning resource equitably to all applicationsin the cloud:

    1. Distributed Services Mediation: This is a FCAPS-based (Fault, Configuration, Accounting,Performance and Security) abstraction layer thatenables autonomous self-management of everyindividual resource in a network of resources thatmay be distributed geographically, and a

    2. Virtual Resource Mediation Layer: This providesthe ability to compose logical virtual servers with alevel of service assurance that guaranteesresources such as number of CPUs, memory,bandwidth, latency, IOPS (I/O operations persecond), storage throughput and capacity.

    Distributed Services Assurance Platform: This layerwill allow for creation of FCAPS-managed virtualservers that load and host the desired choice of OS toallow the loading and execution of applications. Sincethe virtual servers implement FCAPS-management,they can provide automated mediation services tonatively ensure fault management and reliability(HA/DR), performance optimization, accounting andsecurity. This defines the management dial-tone in ourreference architecture model. We envision that serviceproviders will offer these virtual servers withappropriate management API (management dial-tone)to the service developers to create self-configuring,self-healing, self optimizing services that can becomposed to create self-managed business workflowsthat are independent of the physical infrastructure.

    Distributed Services Delivery Platform: This isessentially a workflow engine that executes theapplication which - as we described in the previoussection, is ideally composed as business workflow thatorchestrates a number of distributable workflowelements. This defines the services dial tone in ourreference architecture model.

    Distributed Services Creation Platform: This layerprovides the tools that developers will use to createapplications defined as collection of services which canbe composed, decomposed and distributed on the fly tovirtual servers that are automatically created andmanaged by the distributed services assuranceplatform.

    Legacy Integration Services Mediation: This is a layerthat provides integration and support for existing orlegacy application in our reference architecture model.

    III. DEPLOYMENT OF THE REFERENCE MODELAny generic cloud service platform requirements must

    address the needs of four categories of stake holders (1)Infrastructure Providers, (2) Service Providers. (3) ServiceDevelopers, and (4) End Users. Below we describe how thereference model we described will affect, benefit and bedeployed by each of the above stakeholders.

    Infrastructure providers: These are vendors who providethe underlying computing, network and storage resources thatcan be carved up into logical cloud computers which will bedynamically controlled to deliver massively scalable andglobally interoperable service network infrastructure. Theinfrastructure will be used by both service creators whodevelop the services and also the end users who utilize theseservices. This is very similar to switching, transmission andaccess equipment vendors in the telecom world who

    incorporate service enabling features and managementinterfaces in their equipment. Current storage and computingserver infrastructure has neither the ability to dynamic dial-upand dial-down resources nor the capability for dynamicmanagement which will help eliminate the numerous layers ofpresent day management systems and the human latency theycontribute. The new reference architecture provides anopportunity for the infrastructure vendors to eliminate currentsystems administration oriented management paradigm andenable next generation real-time, on-demand, FCAPS-basedmanagement so that applications can dynamically request thedial-up and dial-down of allocated resources.

    Service providers: With the deployment of our newreference architecture, service providers will be able to assure

    both service developers and service users that resources will beavailable on demand. They will be able to effectively measureand meter resource utilization end-to-end usage to enable adial-tone for computing service while managing Service Levelsto meet the availability, performance and security requirementsfor each service. The service provider will now manage theapplications connection to computing, network and storageresource with appropriate SLAs. This is different from mostcurrent cloud computing solutions that are nothing more thanhosted infrastructure or applications accessed over the Internet.

    51

  • 8/7/2019 2010-Next generation Cloud Computing Architecture

    5/6

    This will also enable a new distributed virtual servicesoperating system that provides distributed FCAPS-basedresource management on demand.

    Service Developers: They will be able to develop cloud-based services using the management services API toconfigure, monitor and manage service resource allocation,availability, utilization, performance and security of their

    applications in real-time. Service management and servicedelivery will now be integrated into application development toallow application developers to be able to specify run timeSLAs.

    End Users: Their demand for choice, mobility andinteractivity with intuitive user interfaces will continue togrow. The managed resources in our reference architecture willnow not only allow the service developers to create and deliverservices using logical servers that end users can dynamicallyprovision in real-time to respond to changing demands, but alsoprovide service providers the capability to charge the end-userby metering exact resource usage for the desired SLA.

    IV. PROOF OF CONCEPTIn order to demonstrate some of the key concepts

    introduced in the reference architecture model described above,we implemented a proof-of-concept prototype purely insoftware. The key objectives for the proof of concept were todemonstrate (1) Resource provisioning based on an applicationprofile, and (2) FCAPS-based dynamic service mediation specifically the ability to configure resources and the ability todial-up or dial-down resources, based on business priorities andchanging workload demand

    Figure 3 Diagram showing a logical view of the infrastructure simulated in

    the proof-of-concept

    To facilitate the demonstration of the objectives above, theentire system was implemented in software without deployingor controlling actual hardware resources. For example, all thecomputing resources i.e. server, network and storage resources,used in the prototype were not real but simulated in software asobjects with the appropriate attributes. The proof-of-conceptdesign allowed the resource objects to be distributed acrossmultiple computers that were part of the same Local AreaNetwork (LAN). This simulated a private cloud of physical

    resources distributed across servers. Each simulated physicalresource object was managed independently by a localmanagement agent that wrapped around it to provideautonomous FCAPS-based management of the resource.Another computer was used to run our prototypeimplementation of the global FCAPS-based mediation layer.

    This mediation layer would discover all the physical

    resources available on the network and become aware of theglobal pool of resources it had available to allocate toapplications. Next, the mediation layer took as input theapplication profiles i.e. CPU, memory, storage IOPS, capacityand throughput as well as relative business priority and latencytolerance of the various applications that were run in the cloud.The logical representation of the entire proof-of-concept isdisplayed in Figure 3.

    Figure 4 Diagram showing the physical representation the proof-of-concept

    All the simulated software resource objects and mediationlayer objects were written using Microsoft Visual C++. Themediation layer provided two interface options i.e. a telnetaccessible shell prompt for command line control as well as abrowser based console application developed using Adobe Flex

    UI components that was served up using an Apache WebServer. The physical architecture of the Proof-of-concept isshown in figure 4.

    With the above setup in place, we were able to simulate therunning of applications and observe dynamic mediation takeplace in real-time. The first step was to demonstrate theprovisioning of virtual server by allocating logical CPU,memory, network bandwidth and storage resources. Figure 5 isone of the many screens from the browser-based managementUI console that shows the allocation and utilization of storageresources for one of the running applications.

    Next we wanted to demonstrate FCAPS-based servicemediation i.e. how logical resources could be dynamicallyprovisioned in response to changing workload patterns. Figure6 shows a screen from the browser-based management UIconsole which allowed us to simulate changes in applicationworkload demand patterns by moving the sliders on the left ofthe screen that represent parameters such as storage networkbandwidth, IOPS, throughput and capacity. The charts on theright of figure 6 show how the mediation layer responded bydialing-up or dialing down logical in response to those changes.

    52

  • 8/7/2019 2010-Next generation Cloud Computing Architecture

    6/6

    Figure 5 - UI screen from proof-of-concept showing resource allocation to an

    application in the storage dashboard

    Figure 6- UI screen from proof-of-concept showing dynamic mediation of

    resources in response to a change in demand for an application.

    V. CONCLUSIONIn this paper, we have described the requirements for

    implementing a truly dynamic cloud computing infrastructure.Such an infrastructure comprises a pool of physical computingresources i.e. processors, memory, network bandwidth andstorage, potentially distributed physically across server andgeographical boundaries which can be organized on demandinto a dynamic logical entity i.e. cloud computer, that cangrow or shrink in real-time in order to assure the desired levelsof latency sensitivity, performance, scalability, reliability andsecurity to any application that runs in it.

    We identified some key areas of deficiency with currentvirtualization and management technologies. In particular wedetailed the importance of separating physical resourcemanagement from virtual resource management and why

    current operating systems and hypervisors which were bornof the server-computing era, are not designed and hence ill-suited to provide this capability for the distributed sharedresources typical of cloud deployment. We also highlighted theneed for FCAPS-based (Fault, Configuration, Accounting,Performance and Security) service mediation to provideglobal management functionality for all networked physical

    resources that comprise a cloud irrespective of theirdistribution across many physical servers in differentgeographical locations.

    We then proposed a reference architecture model for adistributed cloud computing mediation (management) platformwhich will form the basis for enabling next generation cloudcomputing infrastructure. We showed how this infrastructurewill affect as well as benefit key stakeholders such as theInfrastructure providers, service providers, service developersand end-users.

    Finally, we detailed a proof-of-concept which weimplemented in software to demonstrate some of the keyconcepts such as application resource provisioning based onapplication profile and business priorities as well as dynamic

    service mediation. Next step is to implement this architectureusing hardware assisted virtualization.

    We believe that what this paper has described issignificantly different from most current cloud computingsolutions that are nothing more than hosted infrastructure orapplications accessed over the Internet. The proposedarchitecture described in this paper will dramatically changethe current landscape by enabling cloud computing serviceproviders to provide a next generation infrastructure platformwhich will offer service developers and end-usersunprecedented control and dynamism in real-time to helpassure SLAs for service latency, availability, performance andsecurity.

    REFERENCES[1] Rao Mikkilineni, Vijay Sarathy "Cloud Computing and Lessons from

    the Past", Proceedings of IEEE WETICE 2009, First InternationalWorkshop on Collaboration & Cloud Computing, June 2009

    [2] Rajkumar Buyyaa, Chee Shin Yeoa, , Srikumar Venugopala, JamesBroberga, and Ivona Brandicc, Cloud computing and emerging ITplatforms: Vision, hype, and reality for delivering computing as the 5thutility, Future Generation Computer Systems, Volume 25, Issue 6, June2009, Pages 599-616

    [3] Adriana Barnoschi, Backup and Disaster Recovery For ModernEnterprise, 5th International Scientific Conference, Business andManagement 2008, Vilnius, Lithuvania. -http://www.vgtu.lt/leidiniai/leidykla/BUS_AND_MANA_2008/inf-communication/630-635-G-Art-Barnoschi00.pdf

    [4] Jason A. Kappel, Anthony T. Velte, Toby J. Welte, MicrosoftVirtualization with Hyper-V, McGraw Hill, New York, 2009

    [5] David Chisnall, guide to the xen hypervisor, First edition, PrenticeHall, Press, NJ, 2009

    [6] Gartners 2008 Data Center Conference Instant Polling Results:Virtualization Summary March 2, 2009

    [7] Graham Chen, Qinzheng Kong, Jaon Etheridge and Paul Foster,"Integrated TMN Service Management", Journal of Network andSystems Management, Springer New York, Volume 7, 1999, p469-493

    53