7

Click here to load reader

Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

  • Upload
    hatruc

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

Storage CloudSimA Simulation Environment for Cloud Object Storage Infrastructures

Tobias Sturm, Foued Jrad and Achim StreitSteinbuch Centre for Computing (SCC)

Department of Informatics, Karlsruhe Institute of Technology, Karlsruhe, [email protected],{foued.jrad, achim.streit}@kit.edu

Keywords: Cloud Computing, Storage as a Service, STaaS, Cloud Broker,CloudSim.

Abstract: Since Cloud services are billed by the pay-as-you-go principle, organizations can save huge investment costs.Hence, they want to know, what costs will arise by the usage ofthose services. On the other hand, Cloudproviders want to provide the best-matching hardware configurations for different use-cases. Therefore,CloudSim a popular event-based framework by Calheiros et al., was developed to model and simulate theusage of IaaS (Infrastructure-as-a-Service) Clouds. Metrics like usage costs, resource utilization and energyconsumption can be also investigated using CloudSim. But this favored simulation framework does not pro-vide any mechanisms to simulate todays object storage-based Cloud services (STaaS, Storage-as-a-Service).In this paper, we propose a storage extension for CloudSim toenable the simulations of STaaS-components.Interactions between users and the modeled STaaS Clouds areinspired by the CDMI (Cloud Data Manage-ment Interface) standard. In order to validate our extension, we evaluatedthe resource utilization and coststhat arise by the usage of STaaS Clouds based on different simulation scenarios.

1 INTRODUCTION

Cloud computing is one of the most emerging tech-nologies of the past few years. Start-ups, big compa-nies and the scientific community explore the benefitsof this kind of resource utilization. These Cloud usersexploit the advantages of less configuration overhead,less investments, less operational costs, highly flexi-ble systems that scale out to large systems, or scale into a minimal resource provisioning, depending on thecurrent needs.

The main benefit from using Cloud services is toreduce the cost for computation and storage by sourc-ing these services out to a third-party provider. Thecrucial questions are about the costs and the perfor-mance of the usage. Users want to know, how muchmoney they are going to spend, if they switch to aCloud solution. On the other side, providers haveto know, what is the best hardware constellation andconfiguration for them. These questions can be an-swered by simulating Cloud environments.

CloudSim (Calheiros et al., 2009) is a popularsimulation environment, which offers capabilities tosimulate computing Clouds that provision computinginfrastructures as a service. The focus ofCloudSimis on the modeling of so-calledCloudletswhich are

jobs, that can be scheduled on VMs (Virtual Ma-chines). Models for hard disks, SAN and files arealready included inCloudSim. However, capable in-terfaces to simulate object STaaS are missing, as wellas a fine-grained model for the size of files.

Object storage systems pool multiple physical de-vices together and provide one logical medium tostore and retrieve many different pieces of informa-tion called objects. Besides user data, an object con-tains so calledmetadata, like timestamps or num-ber of replicas. Objects can be accessed by theirserver-wide unique ID and can be created, updated,read (complete or partially) by all authorized clients(Azagury et al., 2003). Containers are virtual collec-tions of objects and may be hierarchical like folderson conventional file systems (CDMI, 2012).

Considering that there is no known simulation en-vironment for object STaaS, which includes a propermodeling of the interfaces, this work extends the ex-isting frameworkCloudSimby adding componentsfor an accurate modeling of object storage.

The rest of this paper is organized as follows: Inthe next section we present the existing simulationframework CloudSim and review the fundamentals ofobject storage, before we discuss prior works relatedto STaaS Cloud simulation. In Section 3 we propose

186

Page 2: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

the extension to enable STaaS Cloud simulation, fol-lowed by Section 4, in which we describe a set of ex-periments we investigated on the extended simulationframework. Section 5 concludes the paper with a briefsummary and describes our future research directions.

2 RELATED WORK

In this section we describe the existingCloudSimframework, explain the needed changes to simu-late STaaS Clouds and then present related works toSTaaS simulation.

2.1 CloudSim

CloudSim is a time discrete simulation frame-work for Cloud computing. The framework con-sists of three layers as described by the authors(Calheiros et al., 2011) (from bottom to top):

1. Core Simulation Engine:Queuing and processingof events, management of Cloud system entitiessuch as host, VMs, brokers, etc.

2. CloudSim: Representation of network topology,message delays, VM provisioning, CPU, storageand memory allocation, etc.

3. User code:General configuration such as Cloudscenarios and user requirements, User Broker, etc.

Users of the framework can either modify the toplayer to model simulation scenarios, or extend thesecond layer, to test different allocation policies in aCloud system. Theuser codelayer defines so-calledcloudletsthat define a specific amount of computa-tion requirements similar to a Cloud job. DatacenterBrokers forward the cloudlets to Cloud providers andmanage and monitor their execution. These jobs aredispatched on available VMs by theCloudSimlayer.Communication between the entities is done via mes-sage events which are sent to thecore simulation en-gine, which handles all events in the correct orderand manages the simulated time. Events between tworemote entities are automatically delayed, if the net-work topology is predefined (Garg and Buyya, 2011).

CloudSim can simulate SAN storage, hard drivesand files, that are stored on hard drives directly or viaSAN storage. But their modeling lacks support forobject storage as follows:

• File size magnitude:CloudSim models the filesize in megabyte, but 90% of all web objectsfit within 16KB (Dukkipati et al., 2010), whichmeans that a content delivery scenario scenariocan not be modeled accurately.

• Hard drive models:The hard drive models do notprovide all metrics that are required to model theread and write durations accurately.

• No storage controller:CloudSim does not offer acontroller that determines the storage location ofobjects.

• No appropriate object storage interface:Nomodel for any STaaS interface, as for exampleCDMI.

2.2 STaaS Simulation

The only work known to us that deals with storagemodeling in CloudSim was performed by Zhao andLong (Long and Zhao, 2012). Even after intensivesearch we have not found any other simulation en-vironment that suits STaaS. Zhao and Long proposean extension of the existing Datacenter model by in-troducing aReplica CatalogandBlock Catalog(Onepiece of information is striped and distributed on dif-ferent physical locations to gain a parallelized andtherefore faster access, catalogs can provides a queryinterface to retrieve those locations) to benchmark dif-ferent configurations. All operations are started fromwithin a Cloudlet, which means that the VM is the re-questing entity. Overall their work presents a mixtureof datacenters that offer computing power and storagecapacities. This work in contrast focuses on the strictseparation of these two capabilities, as currently donein real-world IaaS providers.

In a previous paper (Jrad et al., 2012), we inves-tigated the use of multiple IaaS Compute providersinterfaced by the Open Cloud Computing Interface1

(OCCI). One user assigns aMeta Brokerto choosebetween multiple available Clouds based onServiceLevel Agreements(SLA). We provide similar SLAmechanisms for STaaS providers like: Available ca-pacity, bandwidth of Cloud interfaces, restrictions onfile sizes, etc.

3 STORAGE CLOUDSIM

We extended the existing simulation frameworkCloudSimto enable the modeling of STaaS Clouds.Therefore we provide additional models in the usercode layer and the layer beneath as we can see in Fig-ure 1. CloudSimconsists of three layers. Our addi-tions to the framework build on top of theCloudSimCore Engine, which is responsible for the time dis-crete message passing between simulation entities.We provide new models for hard disks, servers and

1http://occi-wg.org

Storage�CloudSim�-�A�Simulation�Environment�for�Cloud�Object�Storage�Infrastructures

187

Page 3: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

Timeaware Resource Utilization

CloudSim Core Simulation Engine

Cloudlet

Execution

VM

ManagementReplica Object Storage

Storage

Policy Enforcement

VM

AccountingStorage

Accounting

CloudSim/StorageCloudSimUser

Interface

Structures

Services

ResourcesNetwork

TopologyHost SAN StorageStorage Server Harddisk

Simulation

SpecificationCloud Scenario

Data Center

BrokerMeta Storage

Broker

Storage

Broker

CloudletVirtual

Machine

CDMI

Request / Response

StorageCloudSimCloudSimUser Code

Usage-

Sequence

Sequence-

Generator

Figure 1: Storage CloudSim architecture Overview.

a more accurate model resource utilization over time.Beside data replication and accounting, we imple-mented different storage policies enforcements. Usersof the simulation framework can generate sequencesof CRUD (Create, Read, Update, Delete) operationson objects and containers and implement differentmulti-Cloud broker policies.

3.1 User Code Extension

The usage of STaaS components is modeled asusagesequence- a sequence of CRUD operations on bothcontainers and objects. The order of these operationsis fixed (an object can only be put into a container ifthe container was created before), but the executionof two sequences never interfere or depend on eachother. We provide tools for automatically generatingsuch sequences in a way that they are random, definedby statistical distribution constraints2 and are repeat-able by storing them in XML files.

Each sequence is attached to a SLA requirement,where hard constraints and soft requirements can bedefined. The hard constraints serve to exclude STaaSClouds that do not fulfill required features (e.g. capa-bility to store user metadata, available capacity, etc.),

2size of objects, delay between requests, balance be-tween read and write operations etc.

whereas the soft requirements lead to a scoring ofproviders whenever there is more than one providerthat fulfills all hard requirements in order to choosethe one with the highest score (e.g. prefer providerswith the lowest storage cost).

The sequences and their requirements are for-warded to theMeta Storage Broker, which query char-acteristics and capabilities of each available STaaSprovider and chooses the best suitable provider forevery sequence and dispatches each one to aStorageBroker. The former is acting as a proxy for a singleuser Cloud interaction by dispatching the requests ofthe sequence and storing all responses from the Cloudfor later analysis.

The entire communication between broker andCloud is wrapped in CDMI calls, which is anindustry-wide standard for accessing STaaS Clouds.

3.2 STaaS Provider Models

Every Cloud is described by some characteristics asfor example networking capabilities, supported op-erations, billing information and hardware specifica-tions or limits as for example maximum size of anentity or maximum number of objects in one con-tainer. Each instance features a CDMI endpoint, aset of servers and a sets of hard disks. On top of the

CLOSER�2014�-�4th�International�Conference�on�Cloud�Computing�and�Services�Science

188

Page 4: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

Table 1: Defined Usage Sequence Patterns.

Name Pattern A Pattern BTraffic / Sequence 1GB .. 100GB (normal distributed) 1MB .. 16GB (gamma distribution 3,2)

Size / Object 1MB..100GB (normal distributed) 1MB .. 1GB (normal distributed)Read Write Ratio 0:1 5:11

Hard Constraintsno container size limitsufficient boundary for object size limitenough storage capacity

enough storage capacity

Soft Constraints low upload and storage costs all costs low

hardware models is the representation of the logicalorganization such as each object might have multi-ple replica, calledblobs. Communication betweenbrokers and STaaS Clouds is accomplished throughCMDI messages, which organize data asCDMI Ob-jectswhithin Containers (both described as entity inthe following). Every object must be stored in onecontainer - containers can not be nested. Each userof the Cloud (represented by a broker) can only seehis entities. They can be accessed via their Cloud-wide uniqueCDMI ID or via a human readable namethat was chosen by the user. Entities feature so-calledMetadata, which is a key-value storage, which holdsfor example the size of an entity, date of creation,number of desired replica or date of last-access. De-pending on the capabilities of a Cloud, a user canmodify or extend these metadata (e.g. the number ofdesired replicas).

The storage controller matches certain policies,such as the physical distribution of blobs that belongto one object (replica). A blob can only be stored ona disk, when the disk holds no other blob of the sameobject, so that the failure of a single disk does notcause the loss of data if replication is used.

Since read, write and update operations involvethe transport of data through the internal network, wemodeled the payload size of these operations on themost fine-grained level (bytes). As we model the stor-

Server

sda1

sda2

sda3

Cloud internal

network

System Bus

Cloud internal

network interfaceDisk Interface

Cloud-internet

connection

Figure 2: IO limitations on the data path.

age of data to the level of disks, we have to modelalso the data transport between hard disks. There-fore, we assigned a bandwidth to network and diskconnections (see Figure 2) inside the STaaS CLoud.The complete data path begins with the broker, whichsends a CDMI request to the STaaS provider. Thedata is then sent to the storage server and then to thehard disk. Every segment of the data path has a cer-tain bandwidth3 to introduce a certain delay. In ad-dition we model the current utilization of these pathsections.

The duration of each operation depends on thebandwidth of the path section that has the lowestavailable bandwidth. Since concurrent operationsmight interleave or overlap, the available bandwidthmight change during one operation. Thus, the totalduration of each operation depends on the least avail-able bandwidth on the path over time and thus the fol-lowing terms must be fulfilled:

payloadSize=∫ tend

tstart

argmini

availableBandwidthi (t)dt.

(1)and ∧i∈pathSections,t

availableBandwidthi (t)< bandwidthLimiti

(2)Read and write operations are dispatched by first-

scheduled-first-served strategy.

4 SIMULATION EXPERIMENTS

4.1 Meta Broker Evaluation

We conducted two sets of experiments, one with onlyone StaaS Cloud provider (Amazon S34) and anotherset with three providers, each with a mixture of pat-terns ofusage sequence(see Table 1). Pattern A fea-tures only uploads of large data chunks. Therefore thecosts for downloads from the Cloud have no impact.The other pattern features up- and downloads.

3uni-directional bandwidth specification is possible.4http://aws.amazon.com/de/s3

Storage�CloudSim�-�A�Simulation�Environment�for�Cloud�Object�Storage�Infrastructures

189

Page 5: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

Table 2: Modeled Cloud Specifications.

S3 Private Swiftmax.Object size unlimited 16GB unlimitedwrite rateread ratecapacityread latencywrite latency

156MB/s156MB/s2TB8.5ms9.5ms

64MB/s156MB/s1TB9ms11ms

156MB/s156MB/s2TB8.5ms9.5ms

capacity 72TB 3TB 32TB# replica 3 3 1$/GB stored$/GB up$/GB down

0.05$0.0002$0.01$

0.04$0.0002$0.0002$

0.01$0.0002$0.1$

We modeled three different Cloud providersAma-zon S3, a private Cloud and aSwift Cloud5, see Table2), each with different pricing, storage capacity andthroughput. The private Cloud has a hard 16GB limitfor objects, but offers the lowest prices.

The meta broker checks all hard constraintsagainst every available Cloud and then calculates ascore for each potential Cloud. The score in ourexample is calculated by the inverse of the relevantcosts, e.g.

score=1

costsperstoredGB+

1costsperuploadedGB

for sequences of pattern A. One hard constraint thatis shared among both patterns is the requirement forsufficient storage capacity. Since users in most usecases do not know the exact volume of data they wantto store or the size of every single object, we added a25% uncertainty to these values. A sequence is de-clined, if no Cloud provider can fulfill all hard re-quirements.

We can verify the correct behavior of the metabroker by plotting the used storage of each Cloudprovider over time, as in Figure 3. Sequences thatdo not contain objects that are bigger than 16GB canbe dispatched on the private Cloud, because it is thecheapest STaaS provider. Sequences that contain ob-jects that are bigger than 16GB will be dispatchedon theSwift Cloudbecause it is cheaper thanAma-zon S3, which is never chosen for any sequence. Wecan see that theSwift Cloudusage rises faster sinceall big objects (greater than 16GB) are stored there.We conducted the above described experiments mul-tiple times and made sure that the randomly generatedinput sequences did not contain any artifacts. A mix-ture of both sequence types was used (2486 sequencesof type A and 2514 of type B). First, we observedthat both experiments (single and multi-Cloud) with50 and 500 sequences succeeded, but the experiment

5http://swift.openstack.org

with 5000 mixed sequences can not be executed onthe single-Cloud environment completely (4150 se-quences are declined due to capacity exhaustion). Ifwe observe the number of succeeded and failed re-quests as well as the declined sequences, we observethat at one point of this experiment, the number ofsucceeded requests stops to rise, the failed requestsspike and the number of declined sequences starts torise. These are declined because the hard requirementfor sufficient storage can not be fulfilled. In total 15requests fail in the single Cloud experiment with 5000mixed sequences. Three reasons (or any combinationof them) explain this behavior:

• Inaccurate SLA: UsageSequencerequests less ca-pacity than it will consume. Cloud storage maybe used by 99.99%, but it accepts the request, be-cause the requested size fits into the remainingstorage.

• Unsatisfiable policies: One reason could be thepolicy, that forces the storage controller to lo-cateStorageBlobs of one object not to be on thesame disk. Maybe there is enough physical stor-age available, but it is not distributed as required.

• Concurrent access: Clouds use the currentlyavailable storage, when responding to a Cloud dis-covery request. Requests do not allocate any stor-age space. Two storage capacity requests can bemet, but one of them fills all of the remaining stor-age, so the other request can not be fulfilled.

One of the most interesting points are the coststhat are billed when the sequences are run on the dif-ferent constellations. We see that multi-Cloud experi-ments generate less costs than the single Cloud exper-iment for the runs with 50 and 500 sequences (60%less costs in the 500 sequence experiment, see Table3). This can be explained by the capacity of the pri-vate Cloud, which provides resources for a fraction

0 20 40 60 80 1000

500

1,000

1,500

Time in minutes

Use

dst

ora

ge

cap

acity

inG

B SwiftprivateAWS

Figure 3: Storage usage for the modeled STaaS Clouds.

CLOSER�2014�-�4th�International�Conference�on�Cloud�Computing�and�Services�Science

190

Page 6: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

Table 3: Average Price per accepted Sequence.

# of input Seq. Single Cloud Multi-Cloud50 0.0150$ 0.0070$500 0.0149$ 0.0058$5000 0.0149$ 0.0128$

Table 4: Total Costs for 250 Input Sequences.

Single Cloud Multi-CloudPattern A 2.62$ 2.62$Pattern B 4.82$ 0.30$

of the costs of theAmazon S3Cloud. But the totalcosts for the 5000 sequence experiment shows that themulti-Cloud constellation can generate more costs,which is in the interest of the user, because in thiscase more sequences are processed, compared to thesingle Cloud experiment. However, the multi-Cloudsetup provides lower prices per accepted sequence ineach experiment, as seen in Table 3.

4.2 Effect of the Used Sequence Type

In order to study the impact of the chosen sequencetype, we conducted four experiments each of themwith a different sequence type and on a single Cloudor a multi-Cloud environment. As we submitted 250sequences, no sequence was rejected in any experi-ment. Table 4 shows that the multi-Cloud variant gen-erates less or the same costs as the single Cloud vari-ant. The costs for single and multi-Cloud for the se-quences of type A is the same, whereas the differencefor sequences of type B is 4.52$ (saving of factor 16).This can be explained by the fact, that the sequencesof type A have tighter restriction on the SLA: Theminimal accepted value for the maximum object sizedepends on the largest object of the sequence, whichcan be between 1 and 100GB. Thus, the majority of Asequences can not be scheduled on the private Cloud,which is deciding for the costs for B sequences

5 CONCLUSIONS

We extended the popular Cloud simulation frameworkCloudSimto enable the simulation of STaaS Clouds.The concurrent use of resources is modeled accuratelyin order to gain realistic simulation results. Detailedmodels for storage disks, servers and storage con-troller are provided and can be easily extended. Mon-itoring allows users to investigate a broad spectrum ofmetrics from each simulated STaaS Cloud as well asfrom the user’s perspective.

Different access patterns can be modeled and usedto generate sequences of requests. These sequences

can be stored and read from file to provide random,but repeatable experiments. In addition, differentSLAs can be modeled, such as certain capabilities ofa Cloud or some restrictions. AMeta Brokerenablessimulations of multiple Clouds, where differentus-age sequencescan be dispatched on the best-matchingClouds. Compared to single Cloud usage users cansave huge costs if the SLAs of the request sequencesare not very strict.

In a future work, we will investigate more realisticpricing models (no linear price functions) and morecomplex SLA mechanisms. The framework will beextended to enable the simulation of the storage allo-cation policies and mechanisms used in current STaaSClouds, for example mechanisms to prevent bit rot-ting of failover reactions to outages of hardware. Wewill also introduce more heuristic parameters to makethe simulation more realistic. Also brokers could notonly compare requirements against the static SLA butcould also calculate metrics based on previous band-width usage and then decide which Cloud to use forfurther requests.

REFERENCES

Azagury, A., Dreizin, V., Factor, M., Henis, E., Naor, D.,Rinetzky, N., Rodeh, O., Satran, J., Tavory, A., andYerushalmi, L. (2003). Towards an object store. InIn Proceedings of the 20th IEEE Conference on MassStorage Systems and Technologies, pages 165–176.

Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose,C. A., and Buyya, R. (2011). Cloudsim: a toolkit formodeling and simulation of cloud computing environ-ments and evaluation of resource provisioning algo-rithms.Software: Practice and Experience, 41(1):23–50.

Calheiros, R. N., Ranjan, R., De Rose, C. A., and Buyya, R.(2009). Cloudsim: A novel framework for modelingand simulation of cloud computing infrastructures andservices.arXiv preprint arXiv:0903.2525.

CDMI (2012). Cloud Data Management Interface (CDMI)Version 1.0.2.

Dukkipati, N., Refice, T., Cheng, Y., Chu, J., Herbert, T.,Agarwal, A., Jain, A., and Sutin, N. (2010). An argu-ment for increasing tcp’s initial congestion window.

Garg, S. K. and Buyya, R. (2011). Networkcloudsim: Mod-elling parallel applications in cloud simulations. InProceedings of the 2011 Fourth IEEE InternationalConference on Utility and Cloud Computing, UCC’11, pages 105–113, Washington, DC, USA. IEEEComputer Society.

Jrad, F., Tao, J., and Streit, A. (2012). Sla based servicebrokering in intercloud environments. InProceed-ings of the International Conference on Cloud Com-puting and Services Science CLOSER2012, pages 76–81, Porto, Portugal.

Storage�CloudSim�-�A�Simulation�Environment�for�Cloud�Object�Storage�Infrastructures

191

Page 7: Storage CloudSim A Simulation Environment for Cloud …downloads.tobiassturm.de/projects/storagecloudsim/paper.pdf · Storage CloudSim A Simulation Environment for Cloud Object Storage

Long, S. and Zhao, Y. (2012). A toolkit for modelingand simulating cloud data storage: An extension tocloudsim. InControl Engineering and Communica-tion Technology (ICCECT), 2012 International Con-ference on, pages 597–600.

CLOSER�2014�-�4th�International�Conference�on�Cloud�Computing�and�Services�Science

192