CHAPTER 2 SYSTEMATIC LITERATURE REVIEW - …shodhganga.inflibnet.ac.in/bitstream/10603/34201/7/07... · · 2015-02-10arrangement is the on-interest senator combined into the Linux

27

CHAPTER 2

SYSTEMATIC LITERATURE REVIEW

2.1 INTRODUCTION

Cloud Computing data center is our topic of importance. In this

chapter we focus on addressing the power and performance trade-off. The

trade-off is vital since the available processing elements consume a lot of

energy as discussed in chapter 1. The inherent unreliability of the distributed

systems poses a major challenge in this research. To begin with, in today's

data centers, a power control strategy might come straight from a server

source (e.g., IBM) and is actualized in the administration processor firmware,

without any information of the provision programming running on the server

(Wang et al 2011b). The server is virtualized in a cloud environment and the

virtualized instances have to be handled with care to use the processing

machines (or) the hosts efficiently. Hence a proper VM consolidation is

necessary to solve the trade-off.

2.1.1 Eco-efficient Data center Management

Eco-efficiency can be directly linked to being environmentally

friendly. It is about how to manage the data centers of the clouds in ways that

have less impact on the energy consumed, as well as on carbon dioxide (CO2)

emissions. Therefore, mechanisms and policies should be put in place to help

understand how green the data center of a cloud is. The dramatic increase in

greenhouse gas emissions is having a detrimental effect on the global climate,

28

like increasing temperatures, dryness, and floods. Also, the ICT industry has

been contributing to this growth; as has been stated by Smarr (2010), who

explains that the carbon emission from this sector is expected to triple from

2002 to 2020. Most of the literature reviewed towards energy efficient data

center focuses on the proper VM consolidation and the machine status in a

data center.

2.1.2 VM Allocation and Effect on Energy Consumption

Firstly, a study by Corradi et al (2012) states that Virtual Machine

(VM) consolidations can be used as a means of reducing the power

consumption of cloud data centers. To illustrate, this technique tries to

allocate more VMs on less physical machines as far as possible to allow

maximum utilization of the running of physical machines. For instance, when

there are two VMs, instead of allocating each one to a physical server that has

not been fully modeled, this technique tries to allocate both VMs on one

physical server and switch the other server off to save energy. Therefore,

using this technique in a data center can reduce the operational costs and

increase the efficiency of energy usage. However, it is important to note that

the number of VMs in one physical machine should not be too high to the

extent that it may degrade the performance of VMs. VM allocation involves

both VM selection and VM placement. Most of the literature conducts

experiments on the reliable processing elements available in a data center.

The distributed system has a property of its own inherent unreliability of the

computing machines available.

29

2.1.3 Machine-Learning Algorithms

The status of the machines available is done using machine learning

techniques. The CPU utilization is predicted and the machines are termed

either under-utilized or over-utilized. The machine-learning approach views

resource provisioning as a demand-prediction function, and applies various

machine learning algorithms to estimate how the Workflow changes over

time. Workflow refers to the set of services that an application is composed of

jobs (Nathani et al 2012). Literatures use multiple linear regressions and feed

forward neural networks to predict resource demand for a cloud. In the

context of IaaS, they enable a hosted application to make autonomic scaling

decisions using intelligent resource prediction techniques. All of these

techniques, however, require historical data to learn effectively. To simulate

historical data, they run a standard client-server benchmark application, TPC-

W, on Amazon’s EC2 cloud. This data is then divided into training sets and

validation sets, using a variety of statistical techniques. We have discussed

some papers on reliability in section 2.4.

Linear regression attempts to fit a curve to the given data points,

while minimising the error between the curve and the observed data. It

effectively yields a function that, when successful, approximates the real

process which has given rise to the observed data. Neural Networks are

another machine-learning technique, which approximates a real-world

process. A neural network consists of an input layer, an output layer and one

or more hidden layers. Each layer consists of neurons that have a certain

value, and are connected to the next layer’s neurons by way of synapses.

These synapses initially start with random weights, but get adjusted on the go.

The network is trained by presenting a known input to the input layer, and

observing the output at the output layer. The difference between the output

produced by the network, and the actual output is the error. This error is fed

30

backwards into the network, to allow the synapses to change their weights.

This is called Training the network. The Sliding Window technique is a

sampling technique to allow the learning algorithm to view the same dataset

from different sample perspectives.

After training, both Linear Regression and Neural Networks were

evaluated using unseen data and compared on several statistical measures.

Neural Networks were found to be able to generalise well, with the prediction

of resource usage closely matching the actual data. However, the only

parameter used by the authors was the resource load placed on the cloud

provider. In reality, each resource has several attributes that are each decision

parameters in their own right. In such a scenario, it is difficult to train such

machine learning methods. Increasing the number of hidden layers in a neural

network does not necessarily increase its predictive power.

2.1.4 Resource Allocation in the Cloud

Byun et al (2011) proposed a cost-optimization method for task

scheduling. They attempt to find the minimum number of resources, given a

set of tasks with deadlines. Like swapping and backfilling attempt to move

tasks that are not in the critical path of the application Workflow, such that

the total cost of the resources used for the Workflow is minimized. Their

algorithm, called Partitioned Balanced Time Scheduling (PBTS), assumes that

tasks are non-preemptible and executed on a single resource or set of

resources. Based on the minimum time charge unit of the provisioning system

(say 1 hour on Amazon’s EC2), the algorithm divides the application’s

estimated work-time into time-partitions. It then iterates over all the tasks that

are yet to be executed, and estimates the next set of tasks that can be fully

scheduled in the next time-partition and their required resources. Having done

this, it schedules these tasks for execution, and repeats the cycle for the

31

remaining tasks, until all tasks are completed. While this results in minimum

cost of resources, it does not take into account other qualities of resources that

might be required by a certain task. For instance, a task might request for 1

hour of processing on a node that has a certain kind of graphics chip or level-

1 cache. Since PBTS assumes that all resource units are homogeneous, these

type of requests cannot be accommodated.

As a result, cluster-level control solutions are needed to allow

shifting of power and workload for optimized system performance. Virtual

power management as discussed in (Nathuji & Schwan 2007) was the first

initial experiments done in this field where they have proposed architecture of

a data center’s resource management system where resource management is

divided into Local and Global policies. At the local level the system leverages

the guest OS’s power management strategies. The global manager gets the

information on the current resource allocation from the local managers and

applies its policy to decide whether the Virtual Machine (VM) placement

needs to be adapted (Beloglazov et al 2011). Dynamic resource management

at the global level was not addressed.

Cluster level performance is very important when there is a large

amount of virtualized servers in play and the servers are vital and a

coordinated energy efficient approach is always viable and should be taken

care of. As many data centers start to adopt virtualization technology for

resource sharing, application performance of each virtual machine (instead of

the entire server) needs to be effectively controlled. Beloglazov & Buyya

(2012) have discussed extensively on the Host oversubscription detection and

VM selection algorithms and competitive analysis of the single VM migration

and dynamic VM consolidation problems have been experimented. Energy

efficient algorithms have been discussed elaborately extending the available

32

algorithms and Minimum energy heuristics have been deployed for cyber-

physical systems where energy from a batter is a major constraint (Wu et al

2011). The literature surveys for the work proposed in the objectives are

addressed in the rest of the chapters.

Nathuji & Schwan (2007) investigated the first time the power

management procedures in the connection of virtualized frameworks as

shown in Table 2.1. Many other papers relating to data center literature have

also been tabulated. They researched the issue of power proficient resource

management in expansive scale virtualized data centers. The creators

portrayed numerous closely related approaches pointed at the minimization of

force utilization under QoS demands, and at power capping. The worldwide

arrangements are answerable for managing different physical machines

utilizing the learning of rack or edge level server attributes and necessities.

These arrangements solidify VMs utilizing movement within request to free

softly stacked server and put them into power recovering states. The authors

proposed part into neighbor-hood and worldwide approaches. At the nearby

level, the framework facilitates and influences force management approaches

of visitor VMs at every physical machine. An illustration of such an

arrangement is the on-interest senator combined into the Linux bit. At this

level, the requisition level QoS is supported as choices about progressions

in power states are issued by the visitor OS. The investigations led by the

creators indicated that the utilization of the proposed approach accelerates

productive coordination of VM and provision particular power management

approaches, and decreases power utilization up to 34% with next to zero

execution errors.

33

Table 2.1 Literature consolidated for IaaS

Author Idea Technique Harnessing Component

Virtualization Implementation

(Nathuji &Schwan 2007)

To reach Minimum Energy Consumption with

Performance Constraints

DVFS, Power Switching, Soft

ScalingCPU, VM Yes

(Garg et al 2011)

To Achieve Minimum Energy

and CO2,Maximum Profit

DVFS CPU No

(Pinheiroet al 2001)

To Achieve Minimum Power,


SwitchingServer Power

CPU, Drive, Network No

(Kumar et al 2009)

Performance and Power Budget Constraints, To

Achieve Minimum Energy

DVFS, VM Consolidation

CPU, RAM, Network Yes

(Buyya et al 2010)

Minimum Energy under


DVFS CPU Yes

(Chase et al 2001)

Minimum Power under

PerformanceConstraints

Workload Consolidation CPU No

(Beloglazov& Buyya

2012)

Energy and VM Consolidation LRMMT CPU and VM

Consolidation Yes

Keeping servers under-utilized is the principle reason behind the

energy utilization around different issues in a data center like the cooling,

individual server leveraging, provisioning and bandwidth accessibility

stipulations. Graph theory has been studied in wide area of distributed

systems and cloud computing is picking up pace in the Distributed

frameworks enclosure (Kwok & Ahmad 1999; Henricksen & Indulska 2006;

Ghazale Hosseinabadi 2009; Fakhar et al 2012). Lundqvistet al (2012) have

34

discussed on the service program mobility which helps CDN to efficiently

provide QoS to the mobile applications. Chen et al (2012) have discussed

over machine to machine transactions on the system correspondence where

different system dominions were highlighted and proficient outcomes were

distributed however server side designs in the correspondence issues were not

talked over, where virtualization is the way to handle variable burdens

regarding the matter of machine to machine transactions and the space issues

concerning virtual had cases which have been discussed about in this paper.

Simultaneous power and scheduling control faces several major challenges.

We have concentrated on the creation of a workflow model to

better understand the cost function and the expense of power of hosts in a

datacenter. The datacenter consists of hosts containing VMs and all

datacenters are distributed globally to address the Infrastructure as a Service.

The IaaS consists of the machines and support modules organising an

infrastructure. The literature has been partitioned into three parts where we

include three works proposed and their consequence to the proposal. The

Cloud Graphical Workflow model has been concentrated in our first part of

the literature survey. In the second work we concentrate on the Energy Curve

model and address the VM migration and SLA violation metrics to harness

the power performance trade-off. Next we concentrate on the machine

learning techniques and their effect on predicting the behaviour of the hosts

available in the datacenter. The over-loading or under-loading of servers were

proposed by a different machine learning technique to efficiently study the

behaviour of reliable nodes based on their alive status which is a binary

property. Finally we extend the literature to analyze the reliability as a

statistical property.

35

36

Based on literature survey, we understand that Energy consumption

on cluster level performance and trust based QoS analysis has not been

considered. We have taken sincere attempts in our thesis to address the

aforementioned issues. Learning environments for cloud computing and

related applications are available. The applications involved and the

environment needed to create a cloud stack is very important. But the real

analysis and the environment can be studied by simulation tools and learning

by simulation is vital when research is considered. Emotional aspects of

involving cloud computing into learning are discussed (Rizzardini & Amado

2012). Cloud based activities were discussed but research based activities in

cloud can be realised when simulated learning is brought in to learn the cloud

itself.

Mikroyannidis (2011) has concentrated on Personal Learning

Environment and the role it plays. ROLE (Responsive Open Learning

Environment) was established and survey from 19 students was conducted

and results were discussed (Rizzardiniet al 2012). Our paper on the other hand

concentrates on the Learning experience of a data center in simulation basis.

Emphasis on the three important layers viz., Software, Platform and

Infrastructure and the learning concepts regarding these layers and how they

make an effect in the cloud computing stack are worth research areas.

Cooperative Energy aware techniques were discussed in our previous paper

(Park & Pai 2006).

Lundqvistet al (2012) have discussed on the service program

mobility which helps CDN (Content Delivery Networks) to efficiently

provide QoS to the mobile applications. Chen et al (2012) have discussed

machine to machine transactions on the network communication where

various network domains were highlighted and efficient results were

published but server side configurations in the communication issues were not

37

discussed. In our thesis, we discuss virtualization as the key to handle variable

workloads when it comes to server to server transactions and the storage

issues with respect to virtual hosted instances. In the paper by Chu & Chen

(2011) for a Green cloud, they have addressed the communication and

computation as a model but the Energy consumption in a data center is mainly

due to the under-used servers where the addressing of the VM dynamic

consolidation in a host is of utmost importance. Hence we have come up with

the Minimum Energy Heuristics which caters to the dynamic consolidation of

VM by linear approximation between the two objectives i.e., VM migration

and SLA violation. Mobile virtualization and Green mobile networks have

been discussed in many other technical papers as well as (Mijat & Nightingale

2011, Wang et al 2012, Seo 2010, Weinberg & Pundit 2009) the authors have

discussed coarsely on the mobile virtualization and the processor handling in

a mobile perspective which is vital but the server VM consolidation is to be

addressed when it comes to the data center energy perspective.

VM Consolidation technique involves VM selection before VM

migration. Energy Consumption due to VM migration involves two hosts

which are of importance since twice the energy of a single host is depleted for

a single VM migration. Our Energy model leverages this Energy-Performance

trade-off perspective. For the keywords searched in IEEE transactions, VM

consolidation and QoS we selected 14 papers from a set of 200 papers from

2008 to 2013 which almost resembles our goal of work. VM selection related

literature are concentrated in this section and as seen most of the literature

have not considered VM migration for energy related parameter efficiency.

2.2 DATA CENTER IN DISTRIBUTED SYSTEMS

The distributed systems where we concentrate on the cloud

computing paradigm we focus on the IaaS. Infrastructure as a Service (IaaS)

involves computing machines and handles VMs for the execution of the HPC

38

or web workloads also known as Applications. We especially considered VM

consolidation techniques to handle energy consumption and reduce

performance degradation to efficiently harness the available machines in a

data center. Resource sharing is an important aspect for distributed system

processing and communication protocols are concentrated here. The machines

for computation have to be concentrated for the resources. Since the resources

are a stack of processor, memory and operating system which are created as

virtual instances and these instances help in execution of the cloud

applications. The resources to be scheduled have to be addressed when it

comes to Infrastructure as a Service. Carbon footprint due to IaaS has been

taken into account for our literature. The cloud workload depends on various

dependencies due to its virtualized execution. The virtualized instances handle

the workload efficiently and care has to be taken to understand the mechanics

behind an application. Application either needs computation or

communication aspects to be concentrated to efficiently harness the energy

related results to be achieved. The virtual machines contribute to the

provisioning of resources for cloud environment to handle jobs with mixed

workloads. The computing and communication efficiency can be achieved if

better system can be evolved to address the power and performance trade-off

in a data center. The data center handles the workloads and this involves time

for execution of jobs and this in turn involves a depletion of energy due to

operating frequency of a running processor. Many researches on cloud

recently deal with virtual machine monitor and global level aspects of data

center. But local level managers which handle VMs are to be carefully noted

for they incur an expense of power due to VM migrations and VM migration

involves performance degradation due to migration this in turn has got an

expense of power for the time an application gets executed. The Hosts or the

computing machines are either over-loaded or under-loaded and this involves

expense of power running either idle machine where idle machines consume

70% of the energy at that state. The over-loaded machines handle VMs which

39

is above their capacity and this in turn leads to an infinite processing thereby

reducing the response and throughput time. The power and performance

trade-off prevails and this has been the focus of research in cloud computing.

The cloud arena and particularly the IaaS involving the resources have a stack

of resources providing computing and communication capabilities which is of

major importance to solve the power performance trade-off. The host status

changes dynamically and forecasting of the host status, before VMs are

provisioned and executed, has to be addressed. The cloud nodes are always

assumed to be reliable and this is assumed based on the host being alive.

Li & Huang (2010) has handled current virtualized cloud platforms;

resource provisioning strategy remains to be a major challenge. Provisioning

will probably gain lower resource utilization determined by peak workload,

and provisioning. The job loads will probably sacrifice the potential profit of

cloud customers because of bad user experiences. VM-based overall

performance isolation in addition restrains source flowing on demand.

Regarding memory, this eventually ends in under-loaded storage and over-

loaded memory inside the same data center. Their paper has proposed a VM-

oblivious vibrant memory optimisation scheme; their case study of server

relief also shows TMemCanal may promote the performance involving

memory-intensive services up to 400%. Server relief is exhaustively

conducted and better results are framed. Memory optimisation was

concentrated in their paper. The resource sharing should be useful but also

look into the proper utility or efficient selection of VMs to provide server

relief. The memory related graph theory was concentrated. But VM and the

Host were also in a graph model so there was a necessity to model the job and

VM complexity.

Wang et al (2011a) focuses on increasing Web business and

processing footprint stimulates server consolidation in data centers. Through

40

virtualization technological know-how, server consolidation can lessen

burden on physical hosts and supply scalable providers. However, the

ineffective memory usage among multiple Virtual Machines (VMs) becomes

the bottleneck throughout server relief environment. Because of inaccurate

RAM usage estimate and the possible lack of memory reference

managements, there exists much assistance performance destruction in data

centers, even though they have occupied a substantial amount memory. To be

able to improve that scenario, they first bring in VM's memory division view

and VM's totally free memory section view. Memory usage of the VM has

been considered in this particular paper although VM consolidation

depending on cloud repute was necessary for a much better energy viewpoint.

Mei et al (2011) have concentrated in Server relief and software

consolidation through virtualization. In that paper, they argue it's important

with regard to both cloud consumers and also cloud providers to be aware of

the a variety of factors that may have significant impact on the performance

of purposes running inside a virtualized cloud. Their paper presents an

extensive performance study of community I/O workloads inside a virtualized

cloud environment. Then they study a couple of representative workloads

inside cloud-based info centers, which often compete with regard to either

computation or communication I/O means, and present the precise analysis on

different facets that make a difference in the throughput performance and

useful resource sharing utility. Finally, they review the impact of different

CPU useful resource scheduling strategies and various workload rates about

the performance involving applications migrating on different VMs hosted

through the same real machine. Performance and also Energy trade-off on

response-time centered VM selection parameters are not addressed. Useful

resource sharing and throughput were concentrated but the energy

consumption was our important criteria. Hence carbon footprint and energy

consumption were important to be looked into.

41

Moghaddam et al (2011) make a calculation for the carbon

footprint and energy usage of a WAN community of data centers is presented.

This formulation is utilized to measure the footprint of any simulation

platform comprising 13 data centers with seven cities at unique geographical

locations around the world. A heuristic formula (a modified GA) is utilized to

optimise performance as well as footprint of the network. Pertaining to tuning

the particular optimisation, different optimisation intervals are actually

proposed in order to extract the very best optimisation interval. The

communication has been tried under unique loads, and each of our results

show an important carbon footprint reduction as a result of VPC data center

consolidation when compared with LAN server relief. Low Carbon and VPC

is their goal along with a GA algorithm that has been implemented as well as

evolutionary formula perspective is analyzed nevertheless we give full

attention to the strength perspective that has not been concentrated from the

authors.

2.3 VM CONSOLIDATION IN VIRTUALIZED RESOURCE

INSTANCES

VM consolidation in terms of SLA violation was an important

criterion. The virtualized instances and migration of these instances depends

on the machines in a data center. The electricity consumption due to hosts

being alive for VM migration to take place is an important aspect to be

considered. Many literatures have concentrated on this area. Gao et al (2013)

in their paper state probably the most important objectives of data center

management would be to maximise their gain minimising electric power

consumption as well as service-level arrangement violations of hosted

software. In their paper, they propose a management answer which takes

advantages of both personal machine resizing as well as server consolidation

to attain energy efficiency and quality of services in virtualized information

42

centers. A novelty of the solution would be to integrate linear encoding, ant

colony optimisation, as well as control idea techniques. An improved energy

aspect was dealt with their paper, VM Number over-subscription as well as

Placement was not discussed in detail. We tried to deal with Host Over-

subscription by means of machine learning techniques of the workload that

has been considered. Execution time and energy consumption can only be

reduced if a proper consolidation can be done.

Viswanathan et al (2011) in their paper presented as well as

evaluated a novel application-centric energy-aware technique for VM portion

that aims at maximising the particular resource operation and strength

efficiency as a result of VM relief while rewarding QoS warranties. To try

this, they formulated an empirical model for the average power consumption

as well as execution time according to measurements by extensive execution

of typical HPC workload benchmarks (all possible allocations according to

number and type of VMs), and designed an algorithm to determine the best

VM portion that achieves an optimisation goal for example minimisation of

energy consumption and/or execution time. Their latest research work are

aimed at making use of machine learning processes to extract on-the-fly any

model from the sub-system operation data compiled from not online

experiments making use of benchmarks. Along with from actual applications

managing VMs as well as they compare planned solution against a lot of real

time traces by employing them.

Wang et al (2011a) focuses on increasing online enterprise and

computing footprint really encourage server relief in data centers. By means

of virtualization technological know-how, server relief can decrease physical

hosts and still provide scalable providers. However, the particular ineffective

memory usage amongst multiple personal machines (VMs) becomes the

bottleneck with server relief environment. The standard test final results show

43

their implementation could certainly save 30% physical memory along with

1% RAM usage in order to save 5% overall performance degradation.

Determined by Xen virtualization platform, the work carry dramatic gains to

business oriented cloud data center that is providing over 2,000 VMs'.

Feller & Morin (2012) in their paper presented as well as evaluated

the vitality management mechanisms of any unique cutting edge of using

energy-aware VM management framework named Snooze. Snooze features a

direct request: it can certainly be either utilized so as to efficiently deal with

production data centers or maybe server like test-bed for state-of-the-art

energy-aware VM arranging algorithms. Moreover we decide to integrate

each of our previously planned nature-inspired VM relief algorithms as well

as compare its scalability with the existing greedy algorithm along with

alternative relief approaches (e.g. based in linear programming). In this thesis

we decide to apply machine learning techniques so as to predict VM learning

resource utilization highs and bring about pro-active results.

Moses et al (2011) in their paper have showcased exactly why the

issue of contention from the shared cache is often a critical issue in virtualized

foreign computing information centers. Future operate would involve detailed

profiling of VMs to steer scheduling decisions. Once enforcement capability

can be purchased, highly complex techniques which combine the main

advantages of monitoring as well as enforcement for cache, memory

bandwidth and also power can be very efficiently used in future cloud-

computing data centers. The discussed literature has been tabulated in Table

2.2 and compared with respect to key parameters.

44

Table 2.2 Literature review of VM consolidation in IaaS

Author

Was VM selection taken

into consideration?

Response time of VM taken into account

Energy Consumption

due to VM migration taken into

consideration?

Graph Theory based

Distributed Cloud

analysis?(Li & Huang 2010) No Yes No No

(Mei et al 2011) Yes Yes Yes No

(Moghaddam et al 2011)

Yes No Yes No

(Gao et al 2013) Yes Yes Yes No

(Viswanathan et al 2011)

Yes Yes Yes No

(Wang et al 2011a) Yes No No No

(Feller et al 2012) Yes No No No

(Moses et al 2011) Yes No Yes Yes

2.4 RELIABLE DATA CENTER FOR ENERGY EFFICIENCY

Reliability of cloud nodes are important to be considered. But

instead of binary property perspective based on alive hosts, we find a new

dimension of analysis. A statistical property for reliability gave us a new

dimension in a real world scenario. Imada et al (2009) investigates power

along with QoS (Quality of Service) overall performance characteristics of

virtualized hosts with virtual machine technological know-how. Currently,

one of the vital problems at data centers with plenty of servers is the particular

increased power consumption. Virtual machines (VMs) tend to be used for

Internet companies for efficient server operations and provisioning. While we

expect of which virtualized servers where multiple VMs run help you save

power, new issues in virtualized servers arise compared to conventional

45

physical servers: migration with the load between servers along with

processor core assignment with the server's workload from the particular

viewpoints of QoS overall performance and energy consumption. Their

experimental results display that server consolidation using VM migration

plays a part in power reduction without or even with slight QoS overall

performance degradation, and assignment of VMs to be able to multiple

processor cores running at the lower frequency can achieve additional power

reduction over a server node. QoS related server consolidation was addressed

but VM consolidation was not considered.

Li & Huang (2010) has handled current virtualized foreign

platforms; resource provisioning strategy is a big difficult task. Provisioning

will achieve low resource utilization based on peak workload, and

provisioning based on average workload will sacrifice plenty of potential

revenue of cloud customers. Customer cloud experience is determined by the

cloud workload handling and how the process scheduling is done. Resource

provisioning for reliable nodes were considered. The server consolidation and

performance optimisation has to be considered for reliable nodes.

Mei et al (2011) have focused on Server consolidation and

application consolidation through virtualization are key performance

optimisations with cloud-based service supply industry. They argue it is

important for both cloud consumers as well as cloud providers to be aware of

the various factors which will have significant effect on the performance

involving applications running in a much virtualized cloud. Lastly, they

analyze the actual impact of various CPU resource scheduling strategies and

various workload rates for the performance of purposes running on various

VMs hosted through the same physical machine. QoS is an essential

parameter to handle if a large customer base is available. Customer workload

and applications ought to be processed for a new valid response period. SLA

46

agreements are bound by way of cloud provider possesses to meet by way of

proper system in position. The under-loaded processors were not considered

since under-loaded processors were of importance in a data center when

considering the VM scheduling strategy.

Marzolla et al (2011) concentrates about the novel opportunities for

achieving energy savings in cloud: Cloud systems make use of virtualization

techniques to be able to allocate computing sources on demand, as well as

modern Virtual Machine (VM) monitors let live migration involving running

VMs. Therefore, energy conservation is possible through server

consolidations which are alive, moving VM instances clear of lightly loaded

computing nodes to become empty there by enabling it to be switched to low-

power mode. QoS according to cloud host character was not concentrated

despite consolidation of VMs as well as live migration. The energy

conservation that is achieved helps customer to provide more application as

well as process running with a data center. The data center turns into

environmentally sustainable as well as a better approach for the application

handling with respect to energy was analyzed. The reliable nodes consider

alive hosts which are inherently unreliable as well since they are distributed in

nature.

Gao et al (2013) proposed an integrated management solution

which takes attributes of both virtual machine resizing and also server

consolidation to obtain energy proficiency and good quality of program in

virtualized data centers. Virtual Machine resizing for the hosts being alive is

an important criteria to be looked into. A novelty of the solution would be to

integrate linear coding, ant colony optimisation, and also control idea

techniques. The author proposes data center management techniques and also

how SLAV may be reduced by simply establishing a profit in energy personal

savings. The QoS according to SLAV metrics supplied by the impair provider

47

really needs to be analyzed. The reliability constraints were examined in

another paper.

Viswanathan et al (2011) offered and evaluated a fresh application-

centric energy-aware strategy for VM allowance that is aimed at maximising

the actual resource use and power efficiency by way of VM combination

while satisfying reliability guarantees. To make this happen, they designed an

empirical model for the average power consumption and also execution time

based on measurements by extensive execution of normal HPC workload

standards (all possible allocations based on number and sort of VMs), and

created an algorithm to look for the best VM allowance that defines an

optimisation goal such as minimisation of energy consumption and/or

execution. QoS was considered based on application as well as types. Their

future research attempts are designed for i) employing machine learning

techniques to extract on-the-fly any model out from the sub-system use data

gathered from off-line experiments employing benchmarks together from true

applications managing on VMs and also ii) compare our suggested solution

against a few of the real time workloads through implementing these on data

centers. Our planned future exploration efforts include, i) extending the

solution to know and service heterogeneous server hardware, which is

necessary for evaluation on a real test bed and also ii) integrating the

suggested solution with schemes for autonomic arctic management in

instrumented data centers.

Shi & Hong (2011) motivated with the limit within the Power

Utilization effectiveness (PUE) with the data centers, the potential benefit of

the combination, and the actual impetus associated with achieving maximum

return on investment (ROI) within the cloud computing market, they looked

into VM placement in the data center, formulate the multi-level generalised

assignment problem (MGAP) intended for maximising the actual profit

48

beneath service levels agreement as well as the power price range constraint

using the model of an virtualized files center, and fix it with a first-fit

heuristic. SLA and power price range was concentrated in this paper while we

in this chapter wished to address the actual SLA depending on reliability

limitations. We evolved with a statistical distributed real world scenario for

studying this depending on the proposed algorithms. Power utilization was

dealt here but VM administration concerning a reliable cloud framework is

necessary which is focused in Feller et al (2012). The service level agreement

for reliable nodes has to be concentrated.

Feller et al (2012) in their paper have evaluated the vitality

management mechanisms of the unique holistic energy-aware VM

administration framework called Snooze. Specially, Snooze ships with

integrated VM monitoring and live migration assistance. Moreover, it utilizes

a resource (RAM, CPU, memory and also network) utilization estimation

engine, detects clog and under-load predicaments and performs event-based

VM relocation and periodic consolidation. Snooze is the rest system

implementing the particular server consolidation algorithm that has been

previously only considered by simulation. Finally, once energy benefits are

enabled, idle servers are generally automatically transitioned right into a

lower power point out (e.g. suspend) and woken standing on demand.

Machine learning techniques are also implemented in chapter 5 in this thesis.

Now in this chapter we plan to develop a cloud character model to evaluate

the reliability of the real time cloud environment. Cloud environment largely

decides the VM consolidation and how it helps to reduce the electricity

consumption.

Hu et al (2012) focussed on traditional Infrastructure-as-a-Service

promotions provide customers with many fixed-size virtual appliance (VM)

instances having resource allocations that hopefully will meet application

49

called for. VM depending on the application workload was analyzed in their

paper. The Host quality was predicted using machine learning techniques

since cloud characteristics was on the provider energy perspective. The idle

nodes to be brought to a sleep mode and the reliable nodes are used for better

VM consolidation.

Feller & Morin (2012) in their paper have introduced a new

scalable, autonomic, and also energy-aware VM managing framework called

Rest. Unlike the active cloud management frameworks, Rest utilizes a self-

organising hierarchical buildings and distributes the actual VM management

tasks across multiple class managers (GMs), with each manager having a

subset of nodes (local controllers (LCs)). Also, fault tolerance is provided in

any respect levels of the actual hierarchy. Consequently, the systems have the

ability to self-heal and continue its operation irrespective of system

component failures. Snooze was highlighted within this paper. But the

important challenge of the ability and QoS analysis depends on the cloud

provider character and its capacity. So it is significant to study the actual

cloud character to help analyze the VM consolidation strategies to be used.

Qi-yi & Ting-lei (2010) based on customer needs, the authors

conducted research for scheduling models. They term cloud computing is

promoted from the business in lieu of academic which usually determines its

focus on user software. Different customers have unique QoS needs. So

according to the given deadline and budget, their article conducts research on

scheduling model from the user's perspective. SLA violation was an important

metric to be handled in a data center. This was addressed in their paper in a

trust perspective and how a reliable cloud environment responded was also

taken into account.

Liu et al (2011) while using prosperity of Cluster Computing, cloud

computing, Grid Computing, and some other distributed high performance

50

computing systems, Internet services requests become an increasing number

of diverse. The large variety of services in addition different Quality of

Assistance (QoA) considerations such as provisioning and monitoring make it

challenging to development effective algorithms to fulfil the entire service

demands, especially for distributed systems. In addition, energy consumption

issue attracts an increasing number of concerns. In this paper, they study a

whole new energy efficient, profit as well as penalty conscious allocation as

well as scheduling tactic for sent out data centers in a multi-electricity-market

atmosphere. Our tactic efficiently manages computing resources to reduce the

running and transferring energy money cost within the electricity value

varying atmosphere. Our extensive experimental final results show the new

approach can certainly significantly reduce the strength consumption money

cost as well as achieve larger system's maintained profit. The discussed

literature has been tabulated in Table 2.3 and compared with respect to key

parameters.

Table 2.3 Literature review of Reliability and Virtualized Cloud Environment in IaaS

Author

IsReliability taken into account?

If QoS taken into account, What was the basic idea on which it

was implemented?

QoS analysis in Datacenter?

(Imada et al 2009) Yes Statistical Analysis of Host Characteristics

Yes

(Li & Huang 2010) No Cloud Environment No(Mei et al 2011) No VM characteristics Yes(Marzolla et al 2011)

No Host Characteristics No

(Gao et al 2013) Yes VM characteristics Yes(Viswanathan et al 2011)

No Workload Analysis No

(Wang et al 2011a) Yes Host Characteristics Yes

51

Table 2.3 (Continued)

(Shi & Hong 2011) Yes Cloud Environment Yes(Feller et al 2012) No Cloud Environment No(Hu et al 2012) Yes Cloud Environment Yes(Feller & Morin 2012)

No Cloud Environment No

(Qi-yi& Ting-lei2010)

No Cloud Environment Yes

(Liu et al 2011) No Cloud Environment Yes

The host character is studied based on reliability for the VM

component and its consolidation.

We found we needed to address three things,

1. Since Energy consumption was the main consideration, we

analyzed the various Host Oversubscription and VM selection

algorithms based on the legacy algorithms and proposed two

algorithms which were efficient.

2. Since from the above available and proposed algorithms we

found that VM migration and SLA violation were important

characters which Energy Consumption was dealing with. We

have proposed an Energy Curve model by the time relationship

between SLA violation and VM migration. Host Analysis

similar to VM analysis and VM selection algorithms were not

found in various literatures.

3. The Host Characteristics’ analysis in various literatures was not

dealt with so we proposed a Statistical Modeling of Real World

Cloud Environment for Reliability where host characteristics’

were studied based on the proposed Energy model.

52

2.5 CLOUDSIM TOOLKIT

IT companies who are willing to offer some services in the Cloud

can use a simulation-based approach to perform some benchmarking

experiments with the services to be offered in dependable, scalable,

repeatable, and controllable environments before real deployment in the

Cloud. Therefore, they can test their services in a controlled environment free

of cost, and through a number of iterations, with less effort and time. Also, by

using simulation, they can carry out different experiments and scenarios to

identify the performance bottlenecks of resources and develop provisioning

techniques before real deployment in commercial Clouds. Therefore,

CloudSim has been developed to fulfil these requirements by simulating and

extensible Clouds.

2.5.1 Architecture of CloudSim

CloudSim can be defined as “a new and extensible simulation

framework that allows seamless modeling, simulation, and experimentation

of emerging Cloud Computing infrastructure and application services”.

Initially, the framework of CloudSim consists of multiple layers starting

from the lowest layer of SimJava up to the top layer of User Code. At the

lowest layer, SimJava provides the base engine of the simulation that

supports the implementation of core functionalities essential for the higher-

level frameworks of the simulation, like queuing and processing of events;

formation of system components (services, hosts, brokers, VMs);

interaction between these components, and administration of the simulation

clock. On top of that layer is the GridSim layer which supports high-level

and fundamental Grid components, such as networks, resources, data sets,

and information services. Then, the CloudSim layer forms the next level of

the architecture that extends the core functionalities of the GridSim layer.

53

This layer supports Cloud-based data center environments, including VMs,

memory, storage and bandwidth. Also, this layer can manage instantiating

and simultaneously implementing a large scale Cloud infrastructure

composed of thousands of system components (VMs, hosts, data centers,

and application). Finally, User Code is the top-most layer of the simulation

toolkit, which reveals the configuration of functionality for the system

components, such as the number and specification of hosts and the

scheduling policies of the broker. At this layer, a developer can model and

perform robust experiments and scenarios of Cloud environments based on

custom policies and configurations already supported by the CloudSim, in

order to evaluate and tackle some Cloud issues like the complexities of

Cloud infrastructure and application.

2.5.2 Usability

In order to use the CloudSim toolkit, users need to have a basic

background in Java programming language because it is written in Java. Also,

it requires users to write some code to use the components from its library in

order to simulate the desired scenarios. Therefore, it is not just about setting

the parameters, running the program, and collecting the results, but it also

requires a deep understanding of how the program works. In addition, a little

knowledge about Integrated Development Environments (IDEs), like

NetBeans or Eclipse, will be useful to ease installing the toolkit and the

development of scenarios. Furthermore, CloudSim provides a library that can

be used to build a ready-to-use solution, such as CloudAnalyst which is built

on top of CloudSim, to offer an easy to use graphical user interface.

54

2.5.3 Capabilities

CloudSim has some compelling features and capabilities that can

be extended to model a custom Cloud Computing environment. According to

(Calheiros et al 2011), CloudSim can offer flexibility and applicability and

with less time and effort to support initial performance testing. It can support

simulating, from small-scale up to large-scale cloud environments containing

data centers, with little or almost no overheads in terms of and consumption

of memory. Also, it has an engine that allows the creation of multiple services

that can be independently managed on a single node of the data center.

Moreover, it supports, in addition to other features, energy-awareness

provisioning techniques at resource, VM, and application level, such as VM

allocation and DVFS. For managing the energy conscious techniques in a data

center, CloudSim architecture contains the key components

CloudCoordinator, Sensor, and VMM. The Sensor component, which is

attached to every host, is used by the CloudCoordinator to monitor particular

performance parameters, like energy consumption and resource. Thus,

through the attached Sensors, CloudCoordinator passes real-time information,

like load conditions and processing share, of the active VMs to the VMM.

Then, VMM uses this information to perform the appropriate application of

DVFS and resizing of VMs. Also, according to VMs’ policy and current of

resources, CloudCoordinator constantly issues VM migration commands and

changes the power state of nodes to adapt the allocation of VMs. Cloud

computing is a term used to describe a style of computing for next generation

service centers where massively scalable service-oriented IT-related

capabilities are dynamically delivered to multiple external customers. A cloud

may host a variety of services, that include Web applications (i.e. Software as

a Service (SaaS)), legacy client-server applications, and platforms (i.e.

Platform as a Service (PaaS) , infrastructure (i.e. Infrastructure as a Service

(IaaS) , and information services.

55

2.5.4 Limitations

CloudSim is a powerful tool for modeling and simulating Cloud

computing, but it has some limitations. Firstly, it is not a ready-to-use tool

that would just require setting parameters only. Actually, it does require

writing some Java code to use its library, as discussed earlier. Also, the

capabilities of CloudSim are sometimes limited and require some extensions.

For instance, CloudAnalyst has been developed as an extension of CloudSim

capabilities to offer a separation of the simulation experimentation exercise

from the technicalities of programming, using the library in order to ease

modeling by simply focusing on the complexity of the simulated scenario,

without spending much effort and time on the language in which the simulator

is interpreted. Cloud Computing has been the focus in the industry and other

research organization. There are various simulation tools that are being

introduced in this paradigm. These Simulation Tools help to learn the Cloud

Computing paradigm in the Distributed Computing technology.

Table 2.4 Comparison of Clouds and other Open Source tools

Properties CloudSim Globus Aneka AlchemiArchitecture Layered Layered

andModular

Utility Model and Layered

Hierarchic andLayered

Platform Unix, .Net, Windows

Unix Unix,Windows, Mac, .Net

Unix, MacWindows, .Net

Language C , C# C , Java C , C# C#, .NetService and Simulation Modeling

VirtualMachineModeling,CloudInfrastructure

LowLevelServices

CloudInfrastructure And Services

CloudInfrastructure and Services

56

A comparison of the available open source toolkits for Cloud

Computing is shown in Table 2.4. It shows that CloudSim has been the only

toolkit to be completely layered and VM (virtual machine) modeling

incorporated, this makes it our choice for simulation.

Table 2.5 CloudSim Compared to the other academic simulators

Label CloudSim GreenCloud MDCSim

Platform SimJava Ns2 CSIM

Language/Script Java C++/OTcl C++/Java

Availability Open source Open source Commercial

Simulation time Seconds Tens of minutes Seconds

Graphical support Limited (CloudAnalyst)

Limited(Networkanimator) None

Application models

Computation, Data transfer

Computation, Data transfer, and Executiondeadline

Computation

Communication models

Limited Full Limited

Energy models Available Precise (servers + network)

Rough (servers only)

Power saving modes

DVFS, power models

DVFS, DNS, and both None

Table 2.5 shows toolkits available for academic purposes which

have been compared with CloudSim. Toolkits help in learning the real world

scenario since data centers are very expensive to provision and maintain. The

research arena has to be on par with the industry standards and hence Cloud

Toolkits help to a greater extent to learn a paradigm in an economical manner.

57

2.6 SUMMARY

The literature helped us to find an unexplored area to work on. Our

research from the literature found the need of workflow model to define the

flow and complexity of the jobs in a cloud data center. The Cloud data center

incurred major energy consumption from the ICT carbon footprint. Many

papers concentrated on the VM consolidation. Handling the over-loaded and

under-loaded servers is an important challenge. The servers and their status

prediction have to be concentrated. The status of hosts is important and the

VM consolidation in these processors is of utmost importance. Most of the

papers we found in the literature concentrated on the Energy aware VM

consolidation which mainly focuses on server resources management in a

virtualized environment. But proper VM selection and Host overloading

prediction were not concentrated. The VM consolidation and its effect on the

Overloaded servers have been concentrated in our research.

The real time traces were not used, instead analytical models took

most of the research power models. The host status prediction was not done

and the VMs were consolidated from the overloaded servers. The underloaded

servers were either used or emptied for our simulations in our research. The

server VM consolidation helped in better analysis of the machines available in

cloud data center. We have developed algorithms for better VM selection.

These algorithms were compared with the proposed machine learning

techniques to predict the host status if it was overloading or not. Many papers

handled machine in a reliable environment where the machines are either

alive or not alive and alive machines were only considered for most of the

research in VM consolidation and energy consumption which can be termed a

binary propertied environment. The data center or a distributed system the

main property is its inherent unreliability which was not considered in most of

the literature. We in our work have specifically introduced performance

58

measures for reliability and the proposed algorithms have been compared for

reliability constraints that are available in a cloud data center. A statistical

model for reliability has been concentrated since a statistical property will

give a wider perspective of the machines to be handled based on its

probability and throughput instead of a binary propertied analysis where

machines which are in a pseudo active state. Hosts can be states where it can

be alive and about to shutdown or some cases it may be shutdown and about

to be active based on the VMs. The VM consolidation helped to analyze the

hosts in a data center. The research was done using real time workload traces

and real world scenarios for the repeatability of experiments in simulations

exhibited here.

Documents

CHAPTER 2 SYSTEMATIC LITERATURE REVIEW - …shodhganga.inflibnet.ac.in/bitstream/10603/34201/7/07... · · 2015-02-10arrangement is the on-interest senator combined into the Linux