13
J. Parallel Distrib. Comput. 72 (2012) 1373–1385 Contents lists available at SciVerse ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc A cost-effective cloud computing framework for accelerating multimedia communication simulations Daniele Angeli, Enrico Masala Control and Computer Engineering Department, Politecnico di Torino, corso Duca degli Abruzzi, 24, I-10129 Torino, Italy article info Article history: Received 30 November 2011 Received in revised form 16 April 2012 Accepted 12 June 2012 Available online 21 June 2012 Keywords: Multimedia simulations Cloud computing Video communication Amazon EC2 Cloud cost comparison abstract Multimedia communication research and development often requires computationally intensive simulations in order to develop and investigate the performance of new optimization algorithms. Depending on the simulations, they may require even a few days to test an adequate set of conditions due to the complexity of the algorithms. The traditional approach to speed up this type of relatively small simulations, which require several develop–simulate–reconfigure cycles, is indeed to run them in parallel on a few computers and leaving them idle when developing the technique for the next simulation cycle. This work proposes a new cost-effective framework based on cloud computing for accelerating the development process, in which resources are obtained on demand and paid only for their actual usage. Issues are addressed both analytically and practically running actual test cases, i.e., simulations of video communications on a packet lossy network, using a commercial cloud computing service. A software framework has also been developed to simplify the management of the virtual machines in the cloud. Results show that it is economically convenient to use the considered cloud computing service, especially in terms of reduced development time and costs, with respect to a solution using dedicated computers, when the development time is longer than one hour. If more development time is needed between simulations, the economic advantage progressively reduces as the computational complexity of the simulation increases. © 2012 Elsevier Inc. All rights reserved. 1. Introduction Nearly all works that propose new algorithms and techniques in the multimedia communication field include simulation results in order to test the performance of the proposed systems. Validation of new algorithms and ideas through simulation is indeed a fundamental part of this type of research due to the complexity of multimedia telecommunication systems. However, the performance of the proposed systems has to be evaluated in many different network scenarios and for several values of all the key parameters (e.g., available bandwidth, channel noise). Moreover, to achieve statistically significant results, simulations are often repeated many times and then results are averaged. In addition, for research purposes, software and simulators are usually developed only as a prototype, i.e., not optimized for speed. For instance, the video test model software available to researchers is usually one or two orders of magnitude slower than commercial software which, however, might not be suitable for research, since it does not come with the source code needed to experiment with new techniques. Corresponding author. E-mail address: [email protected] (E. Masala). As a consequence, the time spent in running such types of sim- ulations and getting results might be significant, sometimes even a few days. Interpreting such results leads to performance improve- ment and bug fixes that need to be tested again with other simula- tions. Focusing on the development stage of these simulations and relatively small size simulations implies that there is the poten- tial for several development–simulation–reconfiguration cycles in a single day, and computational resources are idle between sim- ulation runs. Therefore, the time needed to get simulation results plays a key role in trying to speed up research activities. A commonly used approach to speed up simulations is to run them in parallel on several computers. However, this approach strongly depends on several variables, e.g., computer power and availability. Buying new, dedicated computers might not be affordable in the case of small research groups since the number of computers should be high and their usage ratio would be low due to the dead times between the various simulation runs needed to improve the algorithms and fix bugs. Accessing large computer resources could be difficult as well since currently it is not easy to acquire resources to spend on computation costs when the hardware is not owned. Indeed in typical research projects which include funding for multimedia communication research, high- performance computing is not seen as one of the primary goals of the project, and costs are usually dominated by items such as 0743-7315/$ – see front matter © 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2012.06.005

A cost-effective cloud computing framework for accelerating multimedia communication simulations

Embed Size (px)

Citation preview

Page 1: A cost-effective cloud computing framework for accelerating multimedia communication simulations

J. Parallel Distrib. Comput. 72 (2012) 1373–1385

Contents lists available at SciVerse ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

A cost-effective cloud computing framework for accelerating multimediacommunication simulationsDaniele Angeli, Enrico Masala ∗

Control and Computer Engineering Department, Politecnico di Torino, corso Duca degli Abruzzi, 24, I-10129 Torino, Italy

a r t i c l e i n f o

Article history:Received 30 November 2011Received in revised form16 April 2012Accepted 12 June 2012Available online 21 June 2012

Keywords:Multimedia simulationsCloud computingVideo communicationAmazon EC2Cloud cost comparison

a b s t r a c t

Multimedia communication research and development often requires computationally intensivesimulations in order to develop and investigate the performance of new optimization algorithms.Depending on the simulations, they may require even a few days to test an adequate set of conditionsdue to the complexity of the algorithms. The traditional approach to speed up this type of relativelysmall simulations, which require several develop–simulate–reconfigure cycles, is indeed to run them inparallel on a few computers and leaving them idle when developing the technique for the next simulationcycle. This work proposes a new cost-effective framework based on cloud computing for acceleratingthe development process, in which resources are obtained on demand and paid only for their actualusage. Issues are addressed both analytically and practically running actual test cases, i.e., simulationsof video communications on a packet lossy network, using a commercial cloud computing service. Asoftware framework has also been developed to simplify the management of the virtual machines in thecloud. Results show that it is economically convenient to use the considered cloud computing service,especially in terms of reduced development time and costs, with respect to a solution using dedicatedcomputers, when the development time is longer than one hour. If more development time is neededbetween simulations, the economic advantage progressively reduces as the computational complexity ofthe simulation increases.

© 2012 Elsevier Inc. All rights reserved.

1. Introduction

Nearly allworks that propose newalgorithms and techniques inthe multimedia communication field include simulation results inorder to test the performance of the proposed systems. Validationof new algorithms and ideas through simulation is indeed afundamental part of this type of research due to the complexityof multimedia telecommunication systems.

However, the performance of the proposed systems has to beevaluated in many different network scenarios and for severalvalues of all the key parameters (e.g., available bandwidth,channel noise). Moreover, to achieve statistically significantresults, simulations are often repeatedmany times and then resultsare averaged. In addition, for research purposes, software andsimulators are usually developed only as a prototype, i.e., notoptimized for speed. For instance, the video test model softwareavailable to researchers is usually one or two orders of magnitudeslower than commercial software which, however, might not besuitable for research, since it does not come with the source codeneeded to experiment with new techniques.

∗ Corresponding author.E-mail address:[email protected] (E. Masala).

0743-7315/$ – see front matter© 2012 Elsevier Inc. All rights reserved.doi:10.1016/j.jpdc.2012.06.005

As a consequence, the time spent in running such types of sim-ulations and getting results might be significant, sometimes even afew days. Interpreting such results leads to performance improve-ment and bug fixes that need to be tested again with other simula-tions. Focusing on the development stage of these simulations andrelatively small size simulations implies that there is the poten-tial for several development–simulation–reconfiguration cycles ina single day, and computational resources are idle between sim-ulation runs. Therefore, the time needed to get simulation resultsplays a key role in trying to speed up research activities.

A commonly used approach to speed up simulations is to runthem in parallel on several computers. However, this approachstrongly depends on several variables, e.g., computer powerand availability. Buying new, dedicated computers might not beaffordable in the case of small research groups since the numberof computers should be high and their usage ratio would be lowdue to the dead times between the various simulation runs neededto improve the algorithms and fix bugs. Accessing large computerresources could be difficult as well since currently it is not easyto acquire resources to spend on computation costs when thehardware is not owned. Indeed in typical research projects whichinclude funding for multimedia communication research, high-performance computing is not seen as one of the primary goalsof the project, and costs are usually dominated by items such as

Page 2: A cost-effective cloud computing framework for accelerating multimedia communication simulations

1374 D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385

staff and development of testbeds. Therefore, the cost and risk ofacquiring a significant number of computers entirely rest on theresearch group.

An effective technique to speed up simulations could be torent computing resources in the cloud and run the computationsin parallel; however there is a significant lack of work in theliterature that quantifies the advantages or disadvantages of sucha solution especially in terms of economic costs in a practicalcase. This paper addresses this issue by providing quantitativeresults, including cost comparisons, that can help in taking themost effective decisions.

Note that the type of scientific tasks considered in this workdoes not fit well into the class of high performance computing(HPC) problems, since in that case requirements are different:the problem is well known, and algorithms to solve it are welltested. The requirement is generally limited to run those types ofalgorithms in the most cost-efficient way, which typically impliesthey are batch-scheduled. In the considered scenario, instead,researchers want to run the simulation code as soon as possibleto speed up further improvements.

The types of simulations addressed in this work are betterdescribed by the many task computing (MTC) definition, whichdenotes high-performance computations comprisingmultiple dis-tinct activities, coupled via file system operations [19]. Multimediacommunication simulations considered here are fully paralleliz-able by nature, making them perfectly suitable for a cloud comput-ing environment. The possibility to parallelize simulations stemsfrom the fact that results are usually averaged on a number of dif-ferent simulation runs that do not have dependency among them.Moreover, using several values for the parameters as the inputof the algorithms adds another dimension to the problem whichagain increases the possibility to further parallelize the simulation.

The contribution of this paper is twofold. First, it providesa simple software framework that can be used to automateall the operations involved in setting up and managing thecloud environment for the specific simulation to be run aswell as to efficiently and quickly collect the simulation results.Second, it analyzes the cost–performance tradeoff using severalactual simulation examples taken from the video communicationresearch area, i.e., H.264/AVC video communications on a packetlossy channel. An analytical approach is employed in order toinvestigate both the economic costs and the performance of theproposed approach. Moreover, actual prices of amajor commercialcloud computing provider are used to quantify, in a practical way,the suitability, economic profitability anddevelopment speedupofemploying the cloud computing approach for the relatively shortscientific simulations, such as the ones faced by many researchers,in a realistic scenariowhere the typicalwork pattern of researchersis also considered, i.e., they do not work 24 h while computers do.

The paper is organized as follows. Section 2 analyzes the relatedwork in the field. Then, Section 3 investigates the requirementsof typical simulations in the multimedia communication fieldand their suitability for cloud computing. Section 4 describes thedeveloped software framework to automate running simulationsin the cloud. In Section 5, a brief performance analysis of thevarious instances in the Amazon cloud computing infrastructureis presented. Section 6 describes the case studies used in thiswork, followed by Section 7 which analytically investigates thecost–performance tradeoffs with practical examples in the case ofboth a single simulation and a whole research activity comprisingseveral simulations. Conclusions are drawn in Section 8.

2. Related work

Some works have been presented in recent years on theprofitability of using cloud computing services in order to improve

the performance of running large scientific applications. Cloud-based services indeed claim that they can achieve significantcost savings over owned computational resources, due to thepay-per-use approach and reduced costs in maintenance andadministration which are spread over a large user basis [3].

Until recently, most of the scientific tasks were run onclusters and grids, and many works explored how to optimize theperformance of scientific applications in such specific contexts. Ataxonomy of scientific workflow systems for grid computing ispresented in, e.g., [27]. However, the cloud is not a completely newconcept with respect to grids, it indeed has intricate connectionsto the grid computing paradigm and other technologies such asutility and cluster computing, as well as with distributed systemsin general [8].

Several works investigated several different aspects involvedin running scientific workflows in the cloud, for instance focusingon optimal data placement inside the cloud [14], the overallexperience and main issues faced when the cloud is used [23] andthe suitability of cloud storage systems such as Amazon S3 for thescientific community [18].

Other works addressed the costs of using cloud computing toperform tasks traditionally addressed by means of an HPC ap-proach with dedicated computational resources. Findings indicatethat in this scenario profitability is somehow limited, at least withcurrent commercially available cloud computing platforms [3].Indeed the performance of general purpose cloud computing sys-tems, such as the virtual machines provided by Amazon [2], aregenerally up to an order of magnitude lower than those of conven-tional HPC clusters [24] and are comparable to low-performanceclusters [7]. Nevertheless, due to the savings achieved by meansof the large scale of these cloud computing systems, they seem tobe a good solution for scientific computing workloads that requireresources in an instant and temporary way [10], although capacityplanning can be quite difficult since traditional capacity planningmodels do not work well [15]. Many works employ benchmarksaimed at predicting the performance of complex scientific applica-tions. Often, these benchmarks focus on testing the efficiency of thecommunication between the various computing nodes, which isan important factor in some types of applications. For instance, [6]focuses on establishing theoretical performance bounds for thecase of a large number of highly parallel tasks competing for CPUand network resources.

The types of simulations addressed in this work fit into themany task computing (MTC) definition [19]. MTC has been inves-tigated, for instance, in [20], where a data diffusion approach ispresented to enable data intensive MTC, in particular dealing withissues such as acquiring computing and storage resources dynami-cally, replicating data in response to demand, and scheduling com-putations close to data both under static and dynamic resourceprovisioning scenarios. Frameworks for task dispatch in such sce-narios have also been proposed recently, such as Falkon [21],which simplify the rapid execution of many tasks on architecturessuch as computer clusters by means of a dispatcher and a multi-level scheduling system to separate resource acquisition from taskdispatch. These frameworks are complemented by means of lan-guages suitable for scalable parallel scripting of scientific comput-ing tasks, such as Swift [25].

Some works have been presented to compare the performanceachieved by means of the cloud with other approaches based ondesktop workstations, local clusters, and HPC shared resourceswith reference to sample scientific workloads. For instance, in [22]a comparison is performed among all these approaches, mainlyfocusing on getting reliable estimates of prediction of performanceof the various architectures depending on theworkflows.However,no economic cost comparisons between the different platformsare shown. Another work [12] considers a practical scientific task

Page 3: A cost-effective cloud computing framework for accelerating multimedia communication simulations

D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385 1375

traditionally run on a local cluster. The authors study a cloudalternative based on the Amazon infrastructure, first developinga method to create a virtual cluster using EC2 instances to makeportability easier, then investigating how the different data storagemethods provided by Amazon impact on the performance. Whilecosts of the Amazon cloud are considered in detail for the proposedarchitectures, no cost comparisonswith the previous cluster-basedarchitecture are presented.

This work helps in quantifying the economic advantageand potential drawbacks in replacing computers dedicated tosimulation in a small research labwith a cloud computing solution.To the best of our knowledge, no works have addressed thisissue so far with reference to the relatively small size simulationspresented in this paper, apart from our short preliminary studypresented in [16]. Even though this might seem quite a peculiarsimulation scenario, many researchers, at least in the multimediacommunication field, share the need to perform simulations of thesize discussed here. Note also that the computational requirementsof these simulations are constantly increasing due to the tendencyto move towards high quality, high resolution images and video,urging researchers to find cost effective ways to deal with thesetypes of simulations.

3. Analysis of simulation requirements

Typical simulations in the multimedia communication fieldinvolve running the same set of algorithms many times withdifferent random seeds at each iteration. The objective is toevaluate the performance of the systemunder test by averaging theresults achieved in various conditions, e.g., different realizationsof a packet lossy channel, so that confidence intervals areminimized. Clearly, such a setup allows many simulations to runin parallel, since no interaction among them is required exceptwhen merging the results at the end of the simulation. Despitethe conceptual simplicity, the actual computational load can behigh since multimedia codecs might be prototypes only, notoptimized for speed, as well as channelmodels or other robustnesstechniques might be complex to simulate. Moreover, considerthat to achieve statistically significant results many different testsignals, e.g., video sequences, should be used in the experimentsso that techniques are validated across a range of different inputconditions.

Other similarly heavy tasksmight include extensive precompu-tations in order to optimize the performance of algorithms that aresupposed to run in real time once deployed in an actual system. Asan example, consider a system optimizing the transmission pol-icy of packets in a streaming scenario. Some precomputed valuesregarding the characteristics of the content, such as the distortionthat would be caused by the loss of some parts of the data, can beuseful to the optimization algorithms, but values need to be com-puted in advance (see, e.g., [17,5]).

Therefore, due to the typical peculiarities of multimedia com-munication simulations, very little effort is needed to parallelizethem. Often, no interaction is needed until the collection of re-sults (or not at all when considering different input signals). Evenin more complex cases, such as precomputation, usually multime-dia signals can be easily split into different independent segments,for instance groups of pictures (GOP) in video sequences, and pro-cessed almost independently.

Therefore, the simulation types described in this section cantake full advantage of the availability of multiple computing units,as in a cloud environment. Parallelism can be exploited both at theCPU level, using more CPUs, and within the CPU taking advantageof multiple cores. As with modern computers, cloud environmentsoffer multicore CPUs in the highest performance tiers, whichindeed require parallelism for a cost effective exploitation of theresources.

4. The cloud simulation software framework

In this work we focus on the Amazon AWS offer as of October2011, which provides the ‘‘Elastic Compute Cloud’’ service, in brief‘‘Amazon EC2’’, that includes a number of instance types withdifferent characteristics in terms of CPU power, RAM size and I/Operformance. The Amazon AWS platform allows one to controlthe deployment of resources in different ways, for instance byusing a web interface or by means of an API, available for differentlanguages. In all cases (web or API), the deployment of virtualsystems in a remote environment and their monitoring requiresseveral operations, although conceptually simple.While for simpleoperations and management of virtual servers this task can beeasily accomplished by a human operator through, e.g., the webinterface, a more time efficient system is needed to manage atthe same time the activation, configuration and deactivation of anumber of instances in order to automatically create the requestedvirtual environment needed by the simulations. A simulationcould, in fact, require one to create, for instance, ten virtualmachines, each one fed with different input parameters so that itoperates on the correct set of data, then check that every one ofthem is correctly running and finally the resulting data has to becollected in a single central point.

Since the simulations considered in this work are quite shortwhen carried out in the cloud computing environment, thetime spent in setting up the appropriate simulation environment(e.g., activating instances, feeding them with the correct startupfiles, etc.) must be minimized otherwise the advantage of cloudcomputing in terms of speed is reduced. To minimize the set uptime, an automatic system is needed.

A number of frameworks have been proposed in the literatureto address the issue of dispatching tasks to a computer system(e.g., a cluster or a grid) where they are usually received by batchschedulers. However, their dispatching time can be high [21]because they usually support rich functionalities such as multiplequeues, flexible dispatch policies and accounting. Lightweightapproaches have also been proposed, for instance the Falkonframework [21], which reduces the task dispatch time by meansof eliminating the support for some of the features.

However, in a relatively small size simulation with almostno dependencies between the tasks the use of such frameworks,aimed at large scale scientific simulations, provides many morefeatures that the ones effectively needed. For these reasons, wedesigned and implemented our software framework, named theCloud Simulation System (CSS), with the aim to create a verylightweight support for the execution of our simulations. Themaindesign criteria were to be able to automate the execution in thecloud of the simulations of the type considered in this work, andto automatically take care of all the aspects of configuration of thecloud, e.g., starting and terminating instances, uploading the initialdata for each instance and downloading the results. Note that theseaspects must be adapted to the specific cloud technology usedregardless of which framework is employed, therefore also if morecomplex frameworks were used, the time reduction in setting upthe framework would have been limited, also considering the timeneeded to learn and adapt the features of a new framework for ouraims.

Fig. 1 shows the general architecture of the CSS. The softwarehas been developed in the Java language in order to be portableon different platforms. First, an offline step is needed, that is, thepreparation of a virtual machine image, named AMI in the Amazonterminology, containing the tools needed for the simulation and afew parameterized commands, usually scripts, that can both runa set of simulations (controlled by an input file) and save thesimulation results to a storage system, for instance the S3 providedby Amazon.

Page 4: A cost-effective cloud computing framework for accelerating multimedia communication simulations

1376 D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385

Fig. 1. General architecture of the proposed cloud simulation software framework.

One of the key components of the architecture is the controllercomputer, which is initially fed with a simulation description, inXML format, of the activities to carry out, including the specificset of input parameters for each single EC2 instance. The controllerautomatically performs a number of activities needed to ensure thesuccessful execution of the set of simulations specified in the XMLfile. The main activities include:1. activating new instances;2. running the simulation software;3. periodically monitoring the status of the instances to get early

warnings in case some software included in the simulation failsor crashes;

4. checking for the end of the simulation;5. downloading the results from the remote storage systems.

In more detail, the system creates instances using the APIprovided by the Amazon Java SDK. The software is packaged in arunnable JAR archive and themain options and operations thatwillbe performed are specified in the XML configuration file. Severalparameters can be specified, such as the number and type ofAmazon instances to use, inwhich region theywill be launched andwhich commands they will execute at startup. It is also possible tospecify a user defined tag to runmore thanone simulation set at thesame time in the cloud. A sample file is shown in Fig. 2. A separatefile contains the access credentials to the Amazon AWS platform.

The monitoring of the simulation is performed through a set ofscripts that will periodically connect to all the instances involvedin the simulation and check their memory and CPU utilization. Ifany of these metrics show anomalous values an automatic emailalert will be sent to a predefined address, including the details ofthe instances that are having issues.

When all simulations end, files are downloaded from theAmazon S3 storage system, used by all the instances to savetheir results. The application developed in this framework willautomatically detect the end of the simulation and then downloadthe resulting data in the local system.

The described framework can efficiently run simulations in theAmazon AWS platform. In order to optimize the cost–performancetradeoff, suitable options must be chosen in the configuration file,for instance the most efficient type of EC2 instance for the givensimulation. The next sections will investigate how to configure thedeveloped framework in order to maximize the performance andminimize the cost of running the simulation in the cloud.

Fig. 2. Sample simulation description file for the developed cloud simulationsoftware framework.

5. Performance analysis of EC2 instances

The computational power unit used by Amazon is the EC2compute unit (ECU), defined as the equivalent to the CPU capacityof a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon processor. Table 1summarizes the characteristics of the Amazon EC2 offer in the EUregion as of October 2011 [1]. Note that there is a particular typeof instance, named micro, whose characteristics cannot be easilydefined as the other ones. More details about the micro instanceare presented later in thiswork. The key quantities peculiar to eachinstance are represented using the following symbols: ϕi is thecost/h/ECU for instance type i, νi is the number of cores and Ei thenumber of nominal ECU per core. The meaning of all the symbolsused throughout the paper is reported in Table 2.

5.1. Raw computing performance

First, the CPU performance of the different instances hasbeen assessed by using a simple CPU-intensive program, i.e.,

Page 5: A cost-effective cloud computing framework for accelerating multimedia communication simulations

D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385 1377

Table 1Characteristics of the available EC2 instances and costs in the EU region (October 2011).

Name (symbol) ECU/core (Ei) # Cores (νi) RAM (GB) I/O perf. Cost/h ($) Cost/h/ECU ($) (ϕi)

std.small 1 1 1.7 Moderate 0.095 0.095std.large 2 2 7.5 High 0.38 0.095std.xlarge 2 4 15 High 0.76 0.095hi-cpu.medium 2.5 2 1.7 Moderate 0.19 0.038hi-cpu.xlarge 2.5 8 7 High 0.76 0.038hi-mem.xlarge 3.25 2 17.1 Moderate 0.57 0.088hi-mem.dxlarge 3.25 4 34.2 High 1.14 0.088hi-mem.qxlarge 3.25 8 68.4 High 2.28 0.088micro Up to 2 1 0.613 Low 0.025 –

Table 2Symbols used throughout the paper.

i Instance of type iEi ECU/core for instance type iνi Number of cores of instance type iϕi Cost ($)/h/ECU of instance type iP (N)i Effective computing power of instance type i using N cores

p(N)i Effective computing power parameter, i.e., P (N)

i normalized by EiC (N)i Cost ($) of one hour of instance type i considering its p(N)

in Number of processes running in parallelS Computing energy (ECU · h) needed to run a simulationK Number of instances used to run the simulation in the cloudTS Time (h) required to run a simulation, requiring energy S, in the cloudCS Cost ($) of running a simulation, requiring energy S, in the cloudη Instance usage efficiency in the cloudTD Time (h) spent to study and modify the algorithm in each cycleTC Total time (h) of one cyclenPC Number of PCs needed to achieve the same time performance of the cloudLPC Lifetime (h) of a PCCPC Cost ($) of a PC including running costs for LPCNcy Number of cycles used for the development of a given techniqueCcloud Total cost ($) of the cloud solutionCnPC Total cost ($) of the solution based on n PCsf Time increase factor due to operators working during daytime onlyCratio Ratio of the cost of the cloud solution to the cost of the nPC-based solution

Table 3Experimental measurements of CPU computing performance of EC2 instances, using only one core. Time refers to the MD5 task.

Name Nominal ECU/core Time (s) Effective comp. power P (std.small = 1.00)

std.small 1 100 1.00std.large 2 36 2.78std.xlarge 2 33 3.03hi-cpu.medium 2.5 42 2.38hi-cpu.xlarge 2.5 32 3.13micro Up to 2 152 0.66

computing theMD5hash of a randomly-generated 100MB file. Theexperiment is repeated 100 times to cache the file into the RAMso that the performance of the storage system does not affect themeasurements. Results are reported in Table 3.

The effective computing power P (1)i of instance i is computed

as:

P (1)i =

t(1)std.small · P(1)std.small

t(1)i

(1)

where t(1)i is the time, as seen by the user, needed by the instanceto compute the MD5 value 100 times using a single process. As areference, the P (1)

i value for the std.small has been set equal to 1.00,so that values can be directly compared with the nominal speed inECU as declared by Amazon AWS. Superscript (1) indicates that theperformance is achieved using only one core.

Note also that the micro instance differs from the others sinceit is not suitable for a continuous computing load. It provides agood alternative for instances that are idle most of the time butthey sometimes must deal with some short bursts of loads, such

as low-traffic web servers. Moreover, there are no guarantees thata minimum amount of processing power will be available at anytime even when the instance is running, making it a sort of ‘‘besteffort’’ offer. For the micro instance, Table 3 shows the ECU/coredeclared byAmazonwhile time and effective computational powerare averaged over several cycles of burst and slow-down periods.For completeness, note that when the micro instance performedat maximum computational speed, it reached P (1)

= 3.38 in ourexperiments, while it provided only P (1)

= 0.09 while in theslow phase. Due to this behavior, this type of instance will not beconsidered further in this work.

Whenmultiple cores are available on a given instance processescan be run in parallel. Fig. 3 shows the effective computing powerof the various instances while performing the same CPU-intensivetask (MD5 hash) using a different number of processes in parallel.The effective computing power parameter p(N) is given by Eq. (2),where the time interval t(N)

i refers to the time elapsed between thestart of the first process and the end of the last running process, as

p(N)i =

t(1)std.small · P(1)std.small

t(N)i · Ei

. (2)

Page 6: A cost-effective cloud computing framework for accelerating multimedia communication simulations

1378 D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385

Table 4Experimental measurements of I/O performance of the various instances, normalized values where hi-cpu.medium = 1.00 (last row shows absolute values in kB/s, usingrecord size = 32 kB).

Instance type I/O classif. Read Write Random read Random write

std.small M 1.41 2.31 1.11 1.19std.large H 1.52 2.32 1.64 1.58std.xlarge H 1.46 2.10 1.91 1.52hi-cpu.medium M 1.00 1.00 1.00 1.00hi-cpu.xlarge H 1.47 1.29 1.50 0.92

hi-cpu.medium M 71317 14454 4545 8511

Fig. 3. Effective computing power of parallel CPU-intensive tasks on differentinstances (normalized by the nominal ECU/cores).

Fig. 4. Cost per p units, for each process, depending on the instance type.

Note that, differently from Eq. (1), the p value is normalized by thenominal ECU/core value Ei, so that values can be easily comparedamong them. As expected, performance tends to slightly decreasewhen the number of processes increases. This result confirms thateach instance can be loaded with CPU-intensive parallel processesup to the number of cores without incurring an unreasonableperformance reduction.

Fig. 4 shows the cost, per process, for one hour of each instancetype for each effective computing power unit p as previouslydefined. The cost is defined as

C (N)i =

p(N)i

νiEiϕi · n(3)

where νiEiϕi is the cost of one hour of instance i and n is thenumber of processes running in parallel. Since the cost of theinstance is constant regardless of the number of processes runningon it, clearly the cost per process decreases when the numberof processes run in parallel in the instance increases, up to thenumber of available cores. Note that each single point in Fig. 4considers the effective computing power units p that can beachieved with that specific number of processes. Considering the

MD5 hash task, the best performance–cost ratio is provided by thehi-cpu.xlarge instance type, followed by the std.xlarge, std.large andhi-cpu.medium.

5.2. I/O performance

In addition to CPU-intensive tasks, the I/O performance of thevarious instances has also been measured. This is important sincesimulations often require I/O activity, especially when dealingwith uncompressed video sequences as is generally the case withvideo quality simulations. Table 4 reports the performance of theI/O subsystem, as measured by the iozone tool [11], for all theinstances considered in this work. The tool has been run withrecord size equal to 32 kB and file size equal to a value larger thanthe maximum amount of available memory to reduce as muchas possible the influence of the operating system disk cache. Forconvenience, the column named ‘‘I/O classification’’ reports theAmazon classification of the I/O performance, where ‘‘M ’’ meansMedium and ‘‘H ’’ means High. The performance is mostly alignedwith the classification, with higher values for the std instance typesespecially for write operations compared to the hi-cpu instances.However, note that, as stated by Amazon [1], due to the sharednature of the I/O subsystemacrossmultiple instances, performanceis highly variable depending on the time the instance is run.

6. Case study: sample simulations

6.1. Simulation characteristics

In order to assess the performance in realistic cases, we de-scribe the typical requirements of some multimedia communi-cation simulations typically used in research activities. The mostimportant part of the dataset in multimedia experiments is thevideo sequences, typically stored in uncompressed format. Gener-ally, sequence length ranges from 6 to 10 s at 30 frames per second,i.e., 180–300 frames. The corresponding size in bytes ranges fromabout 26 MB up to 890 MB depending on video resolution, fromCIF (352 × 288) to FullHD (1920 × 1080). Typically, four or fivevideo sequences are generally enough to represent a range of videocontents suitable to draw reliable conclusions about the presentedtechniques. Since results are computed as the average performanceover different channel realizations, to achieve statistically signifi-cant results from 30 to 50 different random channel realizationsare needed. Moreover, often a couple of parameters can be var-ied, e.g., one in the channel model and one in the algorithms to betested, thus three (theminimum to plot a curve) to five values haveto be tested for each parameter. Once each transmission simula-tion has been performed, the video decoder is run, e.g., the H.264standard test model software [13] as done in this work, thus ob-taining a distorted video sequence whose size in bytes is the sameas the original uncompressed video sequence size. Finally, perfor-mance can be measured using various video quality metrics, rang-ing from simplemean squared error (MSE) that can be immediatelymapped into PSNR values [26], the most commonly used measure

Page 7: A cost-effective cloud computing framework for accelerating multimedia communication simulations

D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385 1379

Table 5Parameters of some representative communication simulations.

Parameter Typical values Sim1 Sim2 Sim3

Resolution 352 × 288–1920 × 1080 352 × 288 704 × 576 1280× 720Pixels per frame 100000 K–2000000 K 101,376 405,504 921,600Sequence length (frames) 180–300 300 300 300Uncompressed video sequence (MB) 26–890 43.5 174 395.5Input files, scripts and executables (MB) 0.5–1 0.61 0.60 0.64

Channel realizations 30–50 50 40 30# of values for channel parameter (e.g., SNR) 3–5 5 4 4# of values for algorithm parameter (e.g., maximum packet size) 3–5 5 4 4# of sequences 4–5 4 4 4Algorithm for quality measure PSNR, SSIM, PVQM PSNR SSIM PVQMTotal size of results (compressed) (MB) 200–600 595 305 228

in the literature, to much more complex algorithms that try to ac-count for the characteristics of the human visual system, e.g., SSIMand PVQM [4,9]. Note that once the previous performance metricshave been computed (typically, a single floating point number foreach frame of the decoded sequence), the decoded video sequencecan be discarded; therefore themaximum temporary storage occu-pancy is limited to the size of one video sequence. Table 5 providesactual values for three representative simulations, respectively alow, moderate and high complexity simulation, that will be usedas examples in the remaining part of the paper. Note that once thenumber of combinations of parameters has been decided, it is alsopossible to compute an estimate of the size of the results producedby the simulation, as shown by the last row of the table.

6.2. Simulation complexity

The time needed to run Sim1 sequentially on, e.g., an Inteli5 M560 processor at 2.67 GHz with 4 GB RAM is about29,500 s, i.e., more than 8 h. It is clear that such a long timemight significantly slow down the development of transmissionoptimization techniques, since every time modifications of thealgorithm are made, for any reason, simulations should be runagain. Therefore, speeding up simulations is definitely interesting.

Using all the computing power available on the i5 computerrequires 17,250 s to run Sim1. By means of a CPU-intensive tasksuch as the MD5 hash described in Section 5.1, it can be seen thatthe computer performance is equal to about 2.75 ECU per core, i.e.,5.5 ECU total (assuming the performance achieved by the std.smallinstance as the reference, equal to 1 ECU). Using the same symbolsintroduced at the beginning of Section 5, νPC = 2 and EPC = 2.75.

Thus, we conclude that Sim1would require about 26.38 h on a 1ECUprocessor, i.e., the computing energy S needed to run it is 26.38ECU · h. Considering the nominal computational power stated byAmazon for the various instance types, the total cost would be 2.51$when using the std family of instances or 1.00 $ using the cheaperhi-cpu family of instances. The value is obtained bymultiplying thecomputing energy S by ϕi, i.e., the cost/h/ECU shown in Table 1,which is the same for all instances belonging to the same family.Note that the cost obtained in this way is independent of thenumber of instances used to perform the simulation, since the codecan be significantly parallelized as discussed in Section 3.

Varying the size of the video frame or the algorithm used tomeasure the video quality performance of the communicationchanges the computing energy needed for the simulation. Fig. 5shows that the amount of computations increases linearlywith theframe size in pixels. Moreover, the computational cost of runningmore complex quality evaluation algorithms is approximately con-stant (in percentage) when compared with the PSNR algorithm.The SSIM algorithm incurs about a 50% increase with respect toPSNR, while the PVQM requires about 180% additional computa-tion time.

Fig. 5. Relative time performance for various frame sizes and quality evaluationalgorithms. Time is assumed to be equal to one for CIF frame size (about 100 kpixel),evaluated with PSNR.

Table 6 reports the complexity in terms of ECU·h of each samplesimulation described in Table 5, as well as the time needed to runthem on the i5 PC using all CPU cores and the cost of running themin the cloud using the cheapest family of instance types. For a high-complexity simulation such as Sim3, the time required by the PC ismore than two days.

6.3. Simulation setup, I/O and memory requirements

As highlighted in Section 4, during the preparation phase ofthe AMI, sequences could be preloaded in the AMI itself since it islikely that simulations have to be repeated many times during thedevelopment of multimedia optimization techniques, as describedin Section 7.2. Storing data in the Amazon cloud (e.g. in the AMI)has a very limited cost, 0.15 $ to store 1 GB for onemonth (fractionsof size and time are charged on an hourly pro rata basis), whichwould be enough to hold nearly 6 sequences of the type employedin Sim2. Data transfer costs to the cloud are zero, while transferringfrom the cloud costs 0.15 $ per GB. Transfer speed is not an issuesince researchers can typically use high speed university networkconnections. For instance, from the authors’ university in Italy thetypical transfer speed to and from an active instance in the AWSregion in EU (Ireland) is about 11 MB/s and 6 MB/s respectively. Inour tests, themost critical part has been transferring from an activeinstance to the S3 storage within the Amazon infrastructure, at theend of simulations when files have to be stored in S3. This could bedone at an average of 3.3 MB/s from each instance. Downloadinglarge files from S3 to a local PC in the university can be done atabout 6.3 MB/s.

Table 7 shows the transfer times needed to load uncompressedsequences, transfer scripts, executable and control files neededto run the simulation, and to collect the results, including time

Page 8: A cost-effective cloud computing framework for accelerating multimedia communication simulations

1380 D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385

Table 6Computing energy (in brackets the additional energy for compression of final results), execution time and cloud costs on the cheapest family of instances depending on thesimulation example.

Sim. ID Computing energy (ECU · h) Time on i5 PC (h) Cloud instance cost ($) Download of results ($)

Sim1 26.38 (0.028) 4.80 1.00 0.06Sim2 101.27 (0.014) 18.41 3.85 0.03Sim3 316.94 (0.011) 57.63 12.04 0.02

Table 7I/O performance depending on the simulation example.

Sim. ID Upload of sequences (only first time) (s) Upload script, exe and setup data (s) Store in S3 (s) Download of results (s)

Sim1 4.4 <1 30.0 94.4Sim2 17.4 <1 4.4 48.3Sim3 39.6 <1 1.1 36.3

Table 8Improvement of execution time using a RAM disk instead of the default Amazonstorage (EBS) for the instance. Hi-cpu.medium instance type.

Sim. ID Time reduction (%)

Sim1 4.0Sim2 0.9Sim3 0.1

to store data in S3 (for each instance). Note that in the providedexamples, for simplicity, results are stored in text files that arethen compressed and downloaded. The data size is determined bythe number of combinations of parameters, which is the highest inSim1, thus this implies more data to download. However, the timeis quite limited (about one minute and a half in the worst case) ifcomparedwith the duration of the simulation, and could be furtherreduced by storing data in a more optimized way, e.g., using abinary format to represent numbers instead of text. The computingenergy needed to compress result files has already been accountedfor in Table 6. In our experiments instance activation times havealways been less than about 45 s, with typical values around 30,therefore they are comparable to or sometimes less than the timeneeded to download the results and much less than the timeneeded to run simulations at themost cost-effective tradeoff point,i.e., about one hour, as it will be determined in Section 7.

The types of computations involved in multimedia communi-cations are typically CPU intensive. However, it is often necessaryto move large amounts of data within the computer system (e.g.,reading and writing large files, that is, the video sequences). Theoperating system disk cache almost always helps in this regard byusing a large amount of memory for this purpose, thus effectivelykeeping most of the data in memory. An alternative approach toensure that RAM is used to access these files could be to create, forinstance, a RAM disk and keep the most critical files there, suchas the video sequence currently tested. We experimented withthis solution, moving all files to a RAM disk whose size is about80% of the available memory, and measuring the values shown inTable 8. Somemodest improvements can be achieved in the case ofSim1 (4%)while they are negligible for Sim2 and Sim3. An additionalexperiment, not shown in the table, which uses the PSNR qualitymeasure and the video frame resolution employed in Sim3 shows a9.2% time improvement. We attribute this behavior to the fact thatwhen the PSNR quality measure is used, it is much more impor-tant to have fast access to the video sequence files (both the orig-inal and the decoded one for the current simulation experiment)since the PSNR only performs very simple computations. For moreCPU-intensive algorithms such as SSIM and PVQM there is almostno improvement in faster access to the video files. However, even9.2% time improvement is not sufficient to justify the use of in-stances with muchmore memory, which cost more than twice perunit of computational power.

7. Analytical analysis

7.1. Single simulation

In order to mathematically characterize the time and costneeded to perform a given simulation, the following notation isintroduced. The amount of workload associated with the givensimulation is denoted by S, as already mentioned. Let K be thenumber of instances used to perform the simulation, and i be thetype of the instances, assuming that only one type is used to runthe whole simulation. The other variables, corresponding to thecharacteristics and costs of each instance type, have already beendefined in Section 3.

The time TS needed to perform a given simulation that requiresS computing energy using K instances of type i is given by

TS(i, K) =S

νiEiK. (4)

The value is inversely proportional to the number of instances K ,i.e., the computation speed can be increased as desired, providedthat the simulation task can be split into a sufficiently high numberof parallel processes, by just increasing the number of instances K .

However, note that Amazon charges every partial hour as onehour, therefore the exact cost of running the given simulation isobtained by rounding up the time used on each instance to thenearest integer hour. Cost is given by

CS(i, K) = νiEiϕiK⌈TS(i, K)⌉ = νiEiϕiK

SνiEiK

(5)

where the ⌈·⌉ function represents the smallest integer greater thanor equal to the argument.

We introduce an efficiency value η which represents theratio of the time interval in which each instance is performingcomputations to the time interval the instance is paid for, that is,

η =TS(i, K)

⌈TS(i, K)⌉. (6)

Eq. (6) presents the behavior shown in Fig. 6. The function hasperiodic local maxima (value equal to 1) when TS is an integernumber of hours, and it can be lower bounded as

η >TS

1 + TS(7)

implying that efficiency tends to 1 for TS values much greater than1 h.

Eq. (5) can be rewritten as

CS(i, K) =νiEiKS

Sϕi

S

νiEiK

= Sϕi

S

νiEiK

S

νiEiK

= Sϕi⌈TS(i, K)⌉

TS(i, K)=

Sϕi

η. (8)

Page 9: A cost-effective cloud computing framework for accelerating multimedia communication simulations

D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385 1381

Fig. 6. Efficiency as a function of the actual time used to perform the simulation.

Fig. 7. Total cost of Sim1 as a function of the time thatwould be needed to completeall the tasks according to the nominal ECU values. Labels within the graph show thenumber of instances (K ) corresponding to the point (with the name of the instancetype in case of ambiguity).

It is clear that there is a lower bound to the cost of simulation S, thatis equal to Sϕi, achieved when the efficiency is one. This happenswhen the simulation time is exactly a multiple of one hour. Inall other cases, η < 1 and the cost is higher than the minimumvalue Sϕi. Efficiency tends to one when simulation time increases.Now consider a simulation time shorter than one hour: ⌈TS⌉ = 1.Substituting ⌈TS⌉ in Eq. (5) andusing Eq. (4), the cost for the specificcase TS < 1 can be written as

CS =Sϕi

TS. (9)

The previous equation shows that in such a condition, i.e., max-imum simulation speed up, due to the Amazon pricing policy onpartial hours, the cost is inversely proportional to simulation time.

For the Sim1 test case the cost thatwould be needed to completeall the tasks according to the nominal ECU values is shown in Fig. 7.Each line represents a different instance type. Eachpoint on the linecorresponds to a different number of instancesK . For high values ofsimulation time, curves are approximately flat since the efficiencyη is high, therefore cost is almost constant. When the simulationtime is decreased by increasing the number of instances K , curvestend to follow a hyperbolic trend, which is due to the trend ofthe lower bound on the efficiency. However, efficiency oscillatesbetween the lower bound and one, therefore the hyperbolic trendis often interrupted. When time is lower than one hour, thetrend becomes hyperbolic for all the instance types, as stated byEq. (9). Moreover, only two hyperbolas are present, which indeedcorrespond to the twopossibleϕi values, set by Amazon for the twoconsidered families of instances, that is, std and hi-cpu.

Note that the previous analysis does not consider the cost ofdisk I/O operations on the Amazon cloud computing platform.

Fig. 8. Total cost of Sim1 as a function of the time needed to complete all the tasksin actual simulations. Different points on the same line correspond to differentnumbers of deployed instances K . Labels within the graph show the K valuecorresponding to the point.

However, in all our experiments, this has always been negligiblecompared to the instance costs as detailed in Section 6.3.

The previous analysis assumes that the nominal computingpower, in terms of ECU stated by Amazon, allows us to computethe running time of a given task on any type of instance. Actually,this is not true, as already shown in Table 3 for the case of aCPU-intensive task. Actual tests on EC2 have been performed byrunning, using the developed framework, a small set of each ofthe simulations included in Sim1, Sim2 and Sim3. The proposedframework has been configured, each time, to use different typesof instances, so that we computed the processing power that canbe achieved by using every type of instance. For every run, onlyone type of instance was tested, i.e., we did not mix instances ofdifferent types to make performance comparison easier. Resultsare reported in Table 9 for Sim1, with reference to the performanceprovided by the std.small instance type. Note that these results onlyapply to the considered simulation, since they are influenced byboth CPU and I/O activity. For each instance, a number of processesequal to the number of available cores has been used to maximizethe exploitation of the resources of each instance. It is clear thatthe best performance is provided by the hi-cpu.medium instancetype.We attribute this behavior to the low number of virtual cores,which seems to perform better in the EC2 infrastructure, and to thefact that the hi-cpu family of instances is probably better suited formainly CPU-bounded tasks such as the ones of our simulations.

To get a clear overview of the actual performance that canbe achieved through the Amazon infrastructure, Fig. 8 shows theactual times and costs that can be achieved by using our proposedframework in order to run all the tasks included in Sim1, withvarious tradeoffs between time and cost, depending on the numberof instances and the instance types. A comparisonwith Fig. 7 showsthe same general trend, but there exist some discrepancies withrespect to the theoretical behavior expected by the nominal ECUvalues. From Fig. 8 it is clear that the most convenient instancetype, in terms of both time and cost, for this particular type ofsimulation is the hi-cpu.medium one. The same applies to Sim2 andSim3.

As a final remark, note that Fig. 8 do not show the pointcorresponding to running the simulations on the dedicatedcomputer since the price would be more than two orders ofmagnitude higher than the one shown for the same set ofsimulations in the cloud, while the time is fixed at 17,250 s, thatis, nearly 5 h. The next section includes a comparison of the timeand cost, in a realistic usage case, for the development of a newtransmission algorithm based on simulation results obtained bymeans of the dedicated computer or the cloud system.

Page 10: A cost-effective cloud computing framework for accelerating multimedia communication simulations

1382 D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385

Table 9Experimental measurements of computing performance of EC2 instances using as many processes as the number of available cores (Sim1).

Name Nominal ECU/core Number of cores Effective comp. power P (std.small = 1.00)

std.small 1 1 1.00std.large 2 2 1.54std.xlarge 2 4 1.32hi-cpu.medium 2.5 2 2.37hi-cpu.xlarge 2.5 8 1.36

Fig. 9. Diagram of the basic structure of a simulation cycle.

7.2. Multiple simulations

This section aims at quantifying the costs and speed up of thecloud computing solution with respect to the dedicated computersolution when simulations are part of an actual research activity.In such a scenario, simulation experiments such as Sim1, Sim2 andSim3 are run many different times to investigate, improve andrefine the performance of the algorithms. In order to investigatethe cost and time required by using a cloud simulation systemrather than a PC we assume that the research activity that leadsto the development of a new algorithm is composed by a numberof consecutive simulation cycles. A simulation cycle is the basicunit of the development process. One simulation cycle includesthe time needed for investigation and a few modifications of thealgorithm as well as the time needed to run the simulations once,i.e.,

TC = TD + TS (10)

where TC is the duration of the simulation cycle, TD is the timespent to study and modify the algorithm, and TS is the time spentto perform the simulation, either using the cloud computing or thededicated computer solution. Fig. 9 illustrates the situation.

Moreover, in order to better model reality, we consider thatoperators (e.g., researchers) will work during the daytime only.Therefore, if the TD implies that the operators’ work cannot beterminated within the day, the remaining part of the work willbe carried out at the beginning of the next day. When the work isterminated, a new simulation cycle can start. Clearly, if simulationsextend past the end of the working day, they can continue since,of course, computers can always work at night. Fig. 10 illustratesthe situation. In the following, we assume a working day equal to11 h and TD values ranging from one hour up to the duration of theworking day.

To better understand the implications of working during thedaytime only, the total time needed to perform all the simulationcycles in the Sim1 scenario with PC-based simulations using thei5 computer is shown in Fig. 11. When operators can work at any

time, time linearly increases with TD, as expected. The behavioris more irregular (but still monotonic as a function of TD) whenoperators work during daytime only. The irregular behavior isdue to the fact that the total time strongly depends on when thesimulation in each cycle ends. If the ending time of the simulationdoes not allow the operators to prepare and start the simulationof the next cycle before the end of the working day, all the nightelapses without any activity being performed, thus increasing thetotal time. In the remainder of the paper we will always computethe total time in the more realistic situation in which operatorswork during daytime only.

In the following, different values of TD as well as differentsimulations (i.e., different TS values) will be considered to identifywhich are the conditions that lead to the highest advantage whenusing the cloud solution instead of the PC-based solution. As shownin Fig. 12, the total time required to perform all the simulationcycles (20 in the figure) increases almost linearly with TD whenusing the cloud which corresponds to TS = 1 h for any simulationset, since this is the best cost–performance tradeoff point asexplained in Section 7.1. However, many simulations such as, e.g.,Sim1, Sim2 and Sim3 have much higher TS when carried out usingthe i5 PC. In such conditions, the TD influence on the total time ismore limited in certain intervals. The difference between the curvefor a given TS and the bottom curve corresponding to TS = 1 hcan be seen as a measure of convenience (in terms of speedup) ofmoving the simulation to the cloud. The speedup itself is shown inFig. 13. For simulations such as Sim1 a significant speedup can beachieved especiallywhen the TD value is lowwhile it reduceswhenTD is close to a whole working day, that is, 11 h. For simulationswith higher TS values, very large speedups can be achieved for lowTD values while the speedup is more limited when TD is close to11 h.

7.3. Cost comparison

The best performance in terms of time is always achievedby using the cloud since the TS value can always be reducedat approximately one hour while the cost remains the same. Toperform computations at least as fast as the cloud solution does,nPC PCs are needed, where

nPC =

S

νPCEPC

. (11)

However, the number of PCs can be reduced if the performanceconstraint is relaxed. Fig. 14 shows that if a moderate increase of

Fig. 10. Diagram of the simulation cycle considering that operators do not work at night.

Page 11: A cost-effective cloud computing framework for accelerating multimedia communication simulations

D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385 1383

Fig. 11. Comparison of the total time needed to perform all the simulation cycleswhen it is assumed that operators also work during the night as soon as thesimulation finishes, or during daytime only. Sim1, PC-based simulations using thei5 computer (TS = 4.8 h).

Fig. 12. Comparison of the total time needed to perform 20 simulation cycles, as afunction of the time TD needed by the operators to work between simulations andthe time needed to run the simulation itself TS .

Fig. 13. Speedup that can be achieved using the cloud considering differentsimulations, i.e., TS values (20 cycles, cloud vs. one i5 PC).

the total time is allowed, e.g., 30%, the number of PCs reduces, forinstance, from 5 to 3 for the Sim1 case when TD is equal to 3 h. Ingeneral, higher TD values lead to a faster decrease of the numberof PCs, since the importance of the simulation time decrease whencompared to the total time, as well as simulations can run at nightthus using time that is not used by the operators. For instance,for TD = 10 h only one PC is needed if a modest 10% total timeincrease is allowed, because for both the cloud and the PC solutionit is necessary to wait for the next day to continue the work (1 hsimulation using the cloud has the same effect, in terms of totaltime, to 4.8 h using the PC). Fig. 15 shows similar results for thecase of Sim2.

Once the number of PCs is known, it is possible to compare thecost of the cloud-based solution with the amount of money that

Fig. 14. Number of PCs needed to achieve the same performance or performanceproportional to that of the cloud solution in the Sim1 case (20 simulation cycles).

Fig. 15. Number of PCs needed to achieve the same performance or performanceproportional to that of the cloud solution in the Sim2 case (20 simulation cycles).

would be needed to purchase and run the PCs. In the following weassume that the lifetime (LPC ) of a PC is 3 years and its cost is about1000 $. Other costs are difficult to quantify, such as managementcosts, hardware failures, renting rooms and necessity of advanceplanning for buying/placing the computers. However, some areeasy, such as electricity costs. For instance, in Italy the price for a100 W electricity load used continuously, 24 h a day, is about 10$/month (assuming 1e = 1.35 $). Thus the cost CPC of maintaininga PC for its lifetime is about 1360 $. Clearly, all the costs mentionedhere, including electricity, are null if the cloud computing solutionis used since they are already included in the price set by the cloudprovider.

Fig. 16 shows the cost of the cloud solution as a percentage ofthe cost of the same-performance or a lower-performance nPC-based solution (depending on the ‘‘time increase’’ value) for thecase of Sim1. When the cost value is lower than 100% the cloudsolution is cheaper than the nPC-based one. This always happensfor TD greater than 1 h if the same performance of the cloudsolution is sought. If some increase in the total time is allowed(up to 40% ‘‘time increase’’), the cost rises but it is still well belowthe cost of the nPC-based solution for TD values greater than 2.5 h.Curves tend to increase while moving towards TD equal to 11 h,since in that case simulations run for most or all the time at night.In this condition, running times up to 13 h do not affect the totaltime, thus the advantage of using the cloud to run computationsas fast as possible decreases. Fig. 17 shows the same result for thecase of Sim2. Comparing it with the previous case, it can be seenthat when a lower performance in terms of time is considered forthe nPC-based solution, the cost of the cloud increases faster if TSis higher, since higher TD values imply that an increasing portion ofthe simulation can run during the night without affecting the totaltime.

Page 12: A cost-effective cloud computing framework for accelerating multimedia communication simulations

1384 D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385

Fig. 16. Cloud cost as a percentage of the cost of a same-performance (time increase0%) or lower-performance (time increase greater than 0%) PC-based solution.Whenthe y-value is lower than 100% the cloud solution is cheaper than the PC-based one.Sim1 case.

Fig. 17. Cloud cost as a percentage of the cost of a same-performance (time increase0%) or lower-performance (time increase greater than 0%) PC-based solution.Whenthe y-value is lower than 100% the cloud solution is cheaper than the PC-based one.Sim2 case.

Fig. 18. Time increase due to the operatorsworking during daytime only comparedto continuous working (20 cycles, cloud-based simulations).

Note that, when the time needed by the nPC-based solutionequals the time needed by the cloud solution, the cost curve is thesame regardless of the complexity of the considered simulation.In this condition, the time taken by simulations in both the cloudand the nPC-based solution is 1 h. Since we consider that operatorswork during daytime only, the actual time taken by Ncy simulationcycles is Ncy(1 + TD)f (TD) where f (TD) is a factor that takes intoaccount the total time increase due to working during daytimeonly. For TD values ranging from 1 to 10 h it can be approximatedas 2, as shown by Fig. 18.

However, the cloud is paid only for the actual usage, thereforethe respective costs of each solution are:

Ccloud =LPCTS

SϕiNcy =LPC

Ncy(1 + TD)f (TD)SϕiNcy

=LPCϕiS

(1 + TD)f (TD)(12)

and

CnPC = CPCS

νPCEPC(13)

which is a lower bound since the number of PCs should be aninteger value. The upper bound on the cloud/nPC cost ratio is:

Cratio =LPCϕiνPCEPC

CPC

1(1 + TD)f (TD)

(14)

whichmatches the behavior shown in the figures. The first fractiondoes not depend on TD and it is equal to 4.0386. The conditionyielding an economic advantage for the cloud solution is:

TD >LPCϕiνPCEPC

CPC

1f (TD)

− 1. (15)

Assuming the f factor equal to 2, the cloud solution is cheaper ifTD is greater than approximately 1 h, and the economic advantageincreases with greater TD values.

8. Conclusions and future work

This work focused on investigating the cost–performancetradeoff of a cloud computing approach to run simulationsfrequently encountered during the research and developmentphase of multimedia communication techniques, characterizedby several develop–simulate–reconfigure cycles. The traditionalapproach to speed up this type of relatively small simulation,i.e., running them in parallel on several computers and leavingthem idle when developing for the next simulation cycle, hasbeen compared to a cloud computing solution where resourcesare obtained on demand and paid only for their actual usage.The comparison has been performed from both an analyticaland a practical point of view, with reference to an actual testcase, i.e., video communications over a packet lossy network.Performance limits have been established in terms of runningtime and costs. Moreover, a cloud simulation software frameworkhas been presented to simplify the virtual machine managementin the cloud. Actual performance measurements and costs havebeen reported for a few sample simulations. Results showed thatcurrently, with reference to the commercial offer of Amazoncloud computing services, it is economically convenient to usea cloud computing approach, especially in terms of reduceddevelopment time and costs, with respect to a solution usingdedicated computers, when the development time is higher thanone hour. However, if a high development time is needed betweensimulations, the economic advantage progressively reduces as thecomputational complexity of the simulation increases.

Future work includes improving the framework in many as-pects. For instance, data could be directly downloaded eliminatingthe need to store them in the S3 system. Data distribution insidethe cloud could be improved so that, for example, some instanceswork on a subset of data (i.e., only some sequences) reducing thenecessity of using the I/O subsystem for re-reading sequences.Associating weights to each task included in the simulation couldallow us to take into account the different execution times whenallocating tasks to the various instances, which would be particu-larly useful to run simulations in which the sequences have mixedframe sizes and lengths.

Acknowledgments

The authors are grateful to the anonymous reviewers for theirinsightful comments, that particularly helped in improving the costanalysis part of this work.

References

[1] AmazonWeb Services, Amazon EC2 instance types. URL: https://aws.amazon.com/ec2/instance-types (last accessed: 24.11.2011).

[2] Amazon Web Services. URL: https://aws.amazon.com (last accessed:24.11.2011).

Page 13: A cost-effective cloud computing framework for accelerating multimedia communication simulations

D. Angeli, E. Masala / J. Parallel Distrib. Comput. 72 (2012) 1373–1385 1385

[3] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Konwinski, G. Lee,D.A. Patterson, A. Rabkin, I. Stoica, M. Zaharia, Above the clouds: a Berkeleyview of cloud computing, Technical Report UCB/EECS-2009-28, University ofCalifornia, Berkeley, February 2009.

[4] A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from errorvisibility to structural similarity, IEEE Transactions on Image Processing 13 (4)(2004) 600–612.

[5] J. Chakareski, J.G. Apostolopoulos, S. Wee, W.-T. Tan, B. Girod, Rate-distortionhint tracks for adaptive video streaming, IEEE Transactions on Circuits andSystems for Video Technology 15 (10) (2005) 1257–1269.

[6] F.A.B. da Silva, H. Senger, Scalability limits of Bag-of-Tasks applicationsrunning on hierarchical platforms, Journal of Parallel and DistributedComputing 71 (6) (2011) 788–801.

[7] C. Evangelinos, C.N. Hill, Cloud computing for parallel scientific HPCapplications: feasibility of running coupled atmosphere-ocean climatemodelson Amazon’s EC2, in: Proceedings of Cloud Computing and Its Applications,Chicago, IL, 2008.

[8] I. Foster, I. Raicu, S. Lu, Cloud computing and grid computing 360° compared,in: Grid Computing Environments Workshop, GCE, Austin, TX, 2008.

[9] A.P. Hekstra, J.G. Beerends, D. Ledermann, F.E. De Caluwe, S. Kohler, R.H.Koenen, S. Rihs, M. Ehrsam, D. Schlauss, PVQM—a perceptual video qualitymeasure, Signal Processing: Image Communication 17 (10) (2002) 781–798.

[10] A. Iosup, S. Ostermann, M. Nezih Yigitbasi, R. Prodan, T. Fahringer, D.H.J.Epema, Performance analysis of cloud computing services for many-tasksscientific computing, IEEE Transactions on Parallel and Distributed Systems22 (6) (2011) 931–945.

[11] Iozone filesystem benchmark. URL: http://www.iozone.org (last accessed:24.11.2011).

[12] K.R. Jackson, L. Ramakrishnan, K.J. Runge, R.C. Thomas, Seeking supernovaein the clouds: a performance study, in: 19th ACM Intl. Symp. on HighPerformance Distributed Computing, HPDC, Chicago, IL, 2010, pp. 421–429.

[13] Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Joint Model Num-ber 11.0 (JM-11.0). URL: http://iphome.hhi.de/suehring/tml/download (lastaccessed: 24.11.2011).

[14] X. Liu, A. Datta, Towards intelligent data placement for scientific workflows incollaborative cloud environment, in: IEEE Parallel and Distributed ProcessingSymposium, Anchorage, AK, 2011, pp. 1052–1061.

[15] C.Z. Loboz, Cloud resource usage: extreme distributions invalidating tradi-tional capacity planning models, in: Proc. of the 2nd Intl. Workshop on Sci-entific Cloud Computing, ScienceCloud, San Jose, CA, 2011.

[16] E. Masala, Moving multimedia simulations into the cloud: a cost-effectivesolution, in: IEEE Intl. Symp. on Parallel and Distributed Processing withApplications, Leganés, Spain, 2012.

[17] E. Masala, J.C. De Martin, Analysis-by-synthesis distortion computation forrate-distortion optimized multimedia streaming, in: Proc. IEEE Int. Conf. onMultimedia & Expo, vol. 3, Baltimore, MD, 2003, pp. 345–348.

[18] M.R. Palankar, A. Iamnitchi, M. Ripeanu, S. Garfinkel, Amazon S3 for sciencegrids: a viable solution? in: Proc. Intl. Workshop on Data-aware distributedcomputing, DADC, Boston, MA, 2008, pp. 55–64.

[19] I. Raicu, I.T. Foster, Y. Zhao, Many-task computing for grids and supercomput-ers, in: Proc. of Workshop on Many-Task Computing on Grids and Supercom-puters, MTAGS, Austin, TX, 2008.

[20] I. Raicu, I. Foster, Y. Zhao, A. Szalay, P. Little, C.M. Moretti, A. Chaudhary, D.Thain, Towards Data IntensiveMany-Task Computing, IGI Global, Hershey, PA,2012 (Chapter 2).

[21] I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, M. Wilde, Falkon: a Fast and Light-weight tasK executiON framework, in: Proc. of ACM/IEEE Conference onSupercomputing, Reno, NV, 2007.

[22] Y. Simmhan, L. Ramakrishnan, Comparison of resource platform selectionapproaches for scientific workflows, in: 19th ACM Intl. Symp. on HighPerformance Distributed Computing, HPDC, Chicago, IL, 2010, pp. 445–450.

[23] J.-S. Vöckler, G. Juve, E. Deelman, M. Rynge, B. Berriman, Experiences usingcloud computing for a scientific workflow application, in: Proc. of the 2ndIntl. Workshop on Scientific Cloud Computing, ScienceCloud, San Jose, CA,2011.

[24] E. Walker, Benchmarking Amazon EC2 for high performance scientificcomputing, login: The USENIX Magazine 33 (5) (2008).

[25] M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinoza, M. Hategan, B.Clifford, I. Raicu, Parallel scripting for applications at the petascale and beyond,IEEE Transactions on Computers 42 (11) (2009) 50–60.

[26] S. Yao, W. Lin, S. Rahardja, X. Lin, E.P. Ong, Z.K. Lu, X.K. Yang, Perceived visualquality metric based on error spread and contrast, in: Proc. of IEEE Intl. Symp.on Circuits and Systems, ISCAS, vol. 4, 2005, pp. 3793–3796.

[27] J. Yu, R. Buyya, A taxonomy of scientific workflow systems for grid computing,ACM SIGMOD Record 34 (3) (2005) 44–49.

Daniele Angeli obtained his bachelor degree from theUniversity of Rome ‘‘Tor Vergata’’ and his master degreein Computer Engineering from the Politecnico di Torino,Turin, Italy, in November 2011. During his last yearat the university he also worked at Telecom ItaliaLab on a new, innovative platform that will supplyurban mobility services through an embedded blackbox integrated inside the vehicle. His research interestsinclude cloud computing, service oriented architecturesand video streaming.

Enrico Masala received the Ph.D. degree in computer en-gineering from the Politecnico di Torino, Turin, Italy, in2004. In 2003, he was a visiting researcher at the SignalCompression Laboratory, University of California, SantaBarbara, where he worked on joint source channel codingalgorithms for video transmission. He is currently Assis-tant Professor in computer engineering at the Politecnicodi Torino. His main research interests include simulationandperformance optimization ofmultimedia communica-tions (especially video) over wireline and wireless packetnetworks.