23
Simulation of heterogeneous cloud infrastructures Konstantinos Giannoutakis Information Technologies Institute/ Centre for Research and Technology Hellas ITI/CERTH

Simulation of Heterogeneous Cloud Infrastructures

Embed Size (px)

Citation preview

Page 1: Simulation of Heterogeneous Cloud Infrastructures

Simulation of heterogeneous cloud

infrastructures

Konstantinos Giannoutakis

Information Technologies Institute/

Centre for Research and Technology

Hellas

ITI/CERTH

Page 2: Simulation of Heterogeneous Cloud Infrastructures

Overview • Introduction

• Simulation

• Towards a framework for simulating heterogeneous clouds

• Conclusions

Page 3: Simulation of Heterogeneous Cloud Infrastructures

Cloud

Environments

• Cloud environments are becoming more popular during the past

decades.

• This fact is due to the flexibility of Cloud environments in resource

allocation as well as resiliency in both software and underlying

hardware.

• According to Forbes, 92% of the total workload will be executed in

Cloud environments by 2020.

• Moreover, by 2020 hyper-scale data-centers will be 485 (259 in

2015 or 21% of all installed data-center servers) or 47% of all

installed data-center servers. (Cisco Global Cloud Index)

• Three major Cloud providers (Microsoft, Amazon, Google) have

almost 1.5 million data-centers.

Page 4: Simulation of Heterogeneous Cloud Infrastructures

Cloud

Environments

• Traditional Cloud environments are formed using CPU-based data-

centers and their architecture is based on the Warehouse Scale

Computer (WSC) (Barroso and Holzle, 2009).

• Recently, heterogeneous hardware such as:

- GPUs

- Intel MICs

- FPGA

- High Performance Clusters

started to be integrated in the Cloud in order to be used for

processing more demanding and specialized workloads (i.e. HPC

applications), while simultaneously decreasing energy

consumption.

• However, the addition of such hardware substantially increases

complexity of monitoring this hardware, provision it or developing

software that can fully take it into advantage.

Page 5: Simulation of Heterogeneous Cloud Infrastructures

Cloud

Environments

Page 6: Simulation of Heterogeneous Cloud Infrastructures

Cloud

Environments

Despite the Pros in modern Cloud environments there are also Cons

which include:

• Overprovisioning: More resources are installed than actually

required in order to match user requests.

• Underutilization: Utilization of modern Clouds is very low (20%-

30%) resulting to increased power consumption.

• Management Issues: Due to the continuously growing scale and

heterogeneity of modern Cloud environments the centralized

management is not effective since more and more outdated data is

used for resource provisioning.

• Organization Issues: Organization of resources to maximize

utilization and "sharpen" the choice of adequate hardware to match

accurately end-user needs is based on global decisions.

Page 7: Simulation of Heterogeneous Cloud Infrastructures

Cloud

Environments

• Most of these problems can be tackled effectively by local

decisions based on a hierarchical self-organization and self-

management system.

• The CloudLightning project aims in using self-organization and

self-management strategies to effectively manage heterogeneous

resource at hyper-scale.

• However, one question remains: How we evaluate, study or

improve hyper-scale Cloud environments, especially since most

hyper-scale environments belong to private companies?

• The answer is: Simulation

Page 8: Simulation of Heterogeneous Cloud Infrastructures

Requirements

for hyper-scale

simulations

What are the key requirements, that limit existing DES simulators, for

hyper-scale simulations?

• Very large amount of computations.

• Accurate models for power consumption based on adequate

interpolating models.

• Native parallel design in order to be able to execute in HPC

environments.

• Support for tasks that can span across multiple Virtual Machines

(VMs).

• Support for accelerators (GPU,MIC,DFE).

• The simulator should be designed in a language that is build for

high performance computations (i.e. C or C++).

With the above in mind, a new simulator has been build

(CloudLightning simulator).

Page 9: Simulation of Heterogeneous Cloud Infrastructures

Architecture • In order to design a simulator for large scale phenomena we can

borrow the design from large scale Engineering and Physics

simulations.

• These simulations are based on a time-advancing loop with

prescribed time granularity.

• The time advances from t0 = 0 to tend with a prescribed sampling

step tstep. (in seconds, milliseconds, etc.).

• This design enables for integration of dynamical components,

since the state of these components can be updated with respect

to tstep.

• This time-stepping approach allows for a dynamic resolution of the

results, since a large time-step will only reveal a coarse picture of

the system, while a small time-step will reveal more fine

interactions.

Page 10: Simulation of Heterogeneous Cloud Infrastructures

Architecture

Abstract Cloud architecture with one Cell with one data-center

Page 11: Simulation of Heterogeneous Cloud Infrastructures

Architecture Note: A Cell can contain multiple data-centers (WSCs) but one broker

(Cell Manager).

Algorithm 1 Driver for the hyper-scale Cloud Simulator

1: Initialize data-centers, network, storage

2: for t = t0 = 0 to tmax with step tstep do

3: Create task queue in Gateway Service at time t

4: Send tasks to Broker

5: Receive tasks from Broker and find adequate resources

6: Assign tasks to the resources

7: Perform update on the affected components from the task-assignment

8: end for

Page 12: Simulation of Heterogeneous Cloud Infrastructures

Parallelization • The parallelization of the simulator is a two stage process (Coarse-

Fine grain parallelization).

• The Gateway Service is residing in the Head Node and is

responsible for creating the task queue and sending tasks to the

Cells.

• Each Cell resides in one multi-core compute node.

• Coarse grain parallelization can be performed via the Message

Passing Interface (MPI).

• The communications between the Gateway Service and the Cells

are minimal, thus even for large number of incoming tasks the

overall running time is not affected significantly.

• These communications are limited to sending the task resource

requirements and parameters: (1) Number of VMs, (2) Number of

vCPUs per VM, (3) Memory per VM, (4) Storage per VM and (5)

Network Bandwidth.

Page 13: Simulation of Heterogeneous Cloud Infrastructures

Parallelization • The most computationally intensive task is the search for adequate

resources for a task and the update of the state of all the

components inside a Cell.

• However, these actions can be performed in parallel using the

multiple cores in each compute node. This, fine-grain

parallelization substantially accelerates simulation locally.

• This inherently parallel design scales both in terms of number of

Cells (horizontally) and number of resources in the Cell (vertically).

• Moreover, this design enables the use of dynamic components

(dynamic Brokers) which change their logical architecture based

on characteristics of the underlying resources.

Page 14: Simulation of Heterogeneous Cloud Infrastructures

ParallelizationAlgorithm 2 Parallel driver for the hyper-scale Cloud Simulator

1: Initialize local data-centers, network, storage in each Cell

2: for t = t0 = 0 to tmax with step tstep do

3: if Head Node then

4: Create task queue in Gateway Service at time t

5: Send tasks to Broker

6: else

7: Receive tasks from Broker and find adequate resources in parallel

8: Assign tasks to the resources

9: Perform update in parallel on the affected components

10: end if

11: Barrier synchronization of distributed threads

12: end for

Page 15: Simulation of Heterogeneous Cloud Infrastructures

Power

consumption

models

The power consumption models used for servers in Cloud simulators

are of two kinds:

• Global models based on minimum and maximum power

consumption. For example:

𝑃 𝑢 = 𝑃𝑖𝑑𝑙𝑒 + (𝑃𝑚𝑎𝑥 − 𝑃𝑖𝑑𝑙𝑒)u, u ∈ [0, 1]

where u is the utilization of the server.

• Piecewise linear interpolation from data obtained from

organizations such as spec.org. This data is the measured power

consumption under certain utilization. For example:

𝑃 𝑢 = 𝑃(𝑢𝑖)+(𝑃(𝑢𝑖+1) −𝑃(𝑢𝑖))(𝑢−𝑢𝑖), 𝑢𝑖 ≤ 𝑢 ≤ 𝑢𝑖+1, 𝑢 ∈ [0,10)

The two examples are the usual practices for computing the power

consumption in present simulators.

For piecewise interpolation models, "not-a-knot" piecewise cubic

interpolation can be used.

Page 16: Simulation of Heterogeneous Cloud Infrastructures

Support for

accelerators

• Available simulators do not support accelerators such GPUs, MICs

and FPGAs.

• The execution model of the these devices is similar to that of the

CPUs, however, accelerators cannot be shared by multiple users

in the Cloud.

• Thus, if a user acquires an accelerator its computational power is

totally utilized by that instance.

• The power consumption of these devices is either minimum (idle

state) or maximum. Thus, the power consumption of a server with

accelerators is:

𝑃 𝑢 = 𝑃𝑐𝑝𝑢 𝑢 +

𝑖=1

𝑛𝑎𝑐𝑐

𝜌𝑖𝑃𝑚𝑎𝑥𝑎𝑐𝑐 +

𝑖=1

𝑛𝑎𝑐𝑐

(1 − 𝜌𝑖)𝑃𝑖𝑑𝑙𝑒𝑎𝑐𝑐

where 𝜌𝑖 ∈ [0,1] is the average utilization of the i-th accelerator

and 𝑛𝑎𝑐𝑐 the number of accelerators.

Page 17: Simulation of Heterogeneous Cloud Infrastructures

Execution

models

There are three basic types of scheduling execution for the VMs residing on a

server:

• Modern execution models are

primarily based on Space-

Time.

• Gang scheduling is applied by

modern operating systems. In

Gang scheduling threads

belonging to the same

application are scheduled

together.

• Bag of Gangs scheduling is

used when simultaneous

application with multiple

threads are scheduled

together.

Space shared:

Time shared:

Space-Time shared:

Page 18: Simulation of Heterogeneous Cloud Infrastructures

Design and

Extensibility

• The simulator based on the presented analysis has been designed

and implemented in C++.

• The Message Passing Interface (MPI), for distributed memory

parallel systems, as well as the Open Multi-Processing (OpenMP),

for shared memory parallel systems, libraries and extensions are

supported in C++.

• The C++ STL includes all the required libraries to build required

lists, queues and maps.

• C++ is also a compiled language and offers fine grain control in

memory and threads.

Page 19: Simulation of Heterogeneous Cloud Infrastructures

Design and

Extensibility

Page 20: Simulation of Heterogeneous Cloud Infrastructures

Design and

Extensibility

• The selected decomposed approach enables for easy extension of

the simulator.

• The extension procedure requires only to insert methods to the

appropriate class. In example, a new power consumption model

can be inserted in the Power Consumption component.

• Adding models can be performed with minimal interaction with the

source code.

• The addition of a new component, in example, a second statistics

engine, requires designing the new class, updating the Cell class

to include it and add the update procedure in the Update and

Statistics Engine.

• Finally, the MPI is responsible for scaling across Compute Nodes,

while OpenMP is responsible for scaling update and search

procedures across the available Cores of a compute node.

Page 21: Simulation of Heterogeneous Cloud Infrastructures

Conclusions • A new hybrid parallel framework for hyper-scale Cloud simulations

has been presented that takes advantage of HPC clusters.

• The new framework is extensible in terms of new models and

components with small software additions.

• Improved power consumption models have been considered that

are more natural to the actual power consumption of modern

CPUs.

• Execution and power consumption models for accelerators have

been given.

Page 22: Simulation of Heterogeneous Cloud Infrastructures

References • L.A. Barroso and U. Holzle, The Datacenter as a Computer: An

Introduction to the Design of Warehouse-Scale Machines, Morgan

& Claypool Publishers, 2009.

• Cisco, Cisco Global Cloud Index: Forecast and Methodology,

2015-2020, 2016.

• CloudLightning, http://www.cloudlightning.eu, 2016.

Page 23: Simulation of Heterogeneous Cloud Infrastructures

Konstantinos Giannoutakis

[email protected]

THANK YOU