Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Universität Stuttgart

Institute of Parallel and

Distributed Systems (IPVS)

Universitaetsstr. 38

70569 Stuttgart

Germany

Improving the Efficiency of Cloud Infrastructures

with Elastic Tandem Machines

Sixth IEEE International Conference on Cloud Computing

Santa Clara, CA, USA

June 29th, 2013

Frank Dürr

Research Group

“Distributed Systems”

Overview

• Motivation

• System Model

• Elastic Tandem Machines

• Evaluation

• Summary

Research Group

Motivation

• Date centers contain up to tens of thousands of hosts

• Energy-efficiency one of the major challenges

• The ideal host is energy proportional [Barroso, Hölzle]

◦ Energy consumption should be proportional to utilization/load

consumption

utilization 100%

Ideal System Real System

0% (idle) 100% 0% (idle)

consumption

utilization

Efficiency

Efficiency 0% Efficient area

of operation

Research Group

Building the ideal energy-proportional machine

• (Almost) no power consumption while being idle

• Elasticity: Scaling up to nominal (maximum) requested resources

100% idle

Fill this area of

inefficient operation!

consumption

utilization

Research Group

Contribution: Elastic Tandem Machines

System on a Chip (SoC)

Machine

• Low performance

• Low power consumption:

~ 2 Watt

Classic high power VM

on commodity PC Hardware

• High performance

• High power consumption

Elastic Tandem Machine: Best of both worlds

• Low power consumption in idle/weak load

• Scale up to maximum nominal resources

• Transparency: Clients see only one ideal machine

Transparent integration of heterogeneous hardware

100 Mbps

700 MHz ARM

512 MB RAM

SD Card ~ 35$

[source: www.dell.com]

Research Group

Contributions in Detail

Show that SoCs can serve low load in realistic settings

• Web server in 3-tier system architecture

Concept for implementing Elastic Tandem Machines

• Handover concept to switch between SoC and VM

◦ Adaptive: based on dynamic load

◦ Transparent, seamless, non-disruptive

▪ Client just sees one “ideal” machine

▪ Existing (TCP) connections don‘t break during handover

◦ “In network” based on Software-defined Networking (SDN)

• Proof of concept implementation and evaluation

Research Group

Overview

• Motivation

• System Model

• Evaluation

• Summary

Research Group

System Model (1)

Target environment: Data center of IaaS provider

• SoC machines (Low-power Micro Instances; LPMI)

• Classic VMs on PC hosts (High-power Instances; HPI)

• One LPMI + one HPI = one Elastic Tandem Machine (ETMI)

• Network:

◦ Core switches SDN-enabled

◦ Programmable forwarding

tables

Controller

Switches

Client

Data center

Top of Rack

Switches …

Internet

ETMI DB

HPI LPMI

Research Group

System Model (2)

3-Tier web service

• ETMI runs web server (middle tier)

◦ One public IP address for ETMI Transparency

◦ One web server instance on LPMI and HPI

• File/DB servers in backend

◦ Store all persistent data and state

◦ Not part of optimization!

Controller

Switches

Client

Data center

Top of Rack

Switches …

HPI LPMI

Internet

DB ETMI

Public Service

Research Group

Overview

• Motivation

• System Model

◦ Overview

◦ System Components

◦ Seamless handover concept

• Evaluation

• Summary

Research Group

Basic Concept: Overview

• HTTP requests either forwarded to …

◦ … LPMI during low load

◦ … HPI during high load

SDN-based programming

of network (forwarding tables)

• LPMI always running

◦ Service always available

• HPI booted on demand on

LPMI overload

• HPI shutdown if current load

would not overload LPMI

Controller

Switches

Internet

Low load path

to LPMI

Research Group

System Components

Handover Controller:

• Switches between LPMI

and HPI based on their

• Programs core switches

using OpenFlow

• Boots or shuts down HPI

via Virtual Machine

Manager

• Hysteresis and “ignore

period” to prevent

oscillation

Switches

Top of Rack

Switches …

Handover

Controller

OpenFlow

MAC Address re-writing:

• If destination IP matches public IP

write MAC of LPMI (or HPI) in frame

IP aliasing:

• NICs configured with (same)

public IP address of service

• Private IPs used for communication

with controller

Research Group

System Components

Load Monitors:

• Notify controller of

LPMI overload and

HPI under-load

• Load metric:

Incoming data rate

• Threshold scheme

(overload, under-load

thresholds)

• Offline benchmarking

to define LPMI

overload threshold

Switches

Top of Rack

Switches …

Internet

Load Monitor Load Monitor

Overload!

Research Group

• Problem: Simple switching breaks existing TCP connections

◦ HTTP 1.1: Multiple requests send over same TCP connection!

• Solution: “Pinning” of existing

connections to old instance

◦ Controller queries instances for

accepted or established connections

▪ Connection monitor (ss or netstat)

◦ Inserts high priority entry into

core switch forwarding table:

▪ (client IP, client port,

public IP, public port)

MAC_rewriting(instance MAC)

Seamless Handover

Internet

Connection

Monitor

Connection

Monitor

Connections?

connection

pinning

Research Group

Client ControllerConnection

Monitor LPMI

Web ServerLPMI

query open connections

ACK / SYN

pin open connections

Seamless Handover

• It‘s not that simple!

◦ There‘s a race condition

• Solution: Block connection requests before querying

◦ Controller programs firewalls on LPMI/HPI

◦ Unblock after flow re-direction

Connection accepted

after query (t1)!

Research Group

Overview

• Motivation

• System Model

• Evaluation

• Summary

Research Group

Evaluation Setup

• Elastic Tandem machine with:

◦ Low-power Instance (SoC): Raspberry Pi

▪ 700MHz ARM CPU, 512 MB RAM,

100 Mbps Ethernet NIC

◦ High-power Instance (PC):

▪ AMD Athlon 64 X2 Dual Core 4.2 GHz

2 GB RAM, 1 Gbps Ethernet NIC

◦ Running Apache Web server, PHP,

Tomcat servlet engine

• Backend: NFS file server, MySQL

• Core switch: PC with Open vSwitch

and multiport NIC

◦ Line rate forwarding (no bottleneck)

• SDN handover controller based on Floodlight

MySQL LPMI

Apache,

Tomcat

Apache,

Tomcat

Handover

Controller

OpenFlow

HTTP-Client

Research Group

LPMI (SoC) Performance for Static Web Pages

20 requests/s

Increase avg. request rate

by 1 request/s every 50 s (Poisson distr.)

Low-power SoC can serve

realistic low-load

(too slow for processing-intensive jobs paper)

Scenario:

• Real static

web pages from: http://www.netsys2013.de/

Performance:

• Throughput:

◦ Max. 26 pages/s

• Response time

◦ Significant

increase at

20 requests/s

(> 150 ms)

◦ Performance limit

Research Group

ETMI Performance for Static Web Pages

Handover LPMI HPI

Increase request rate

by 1 request/s every 50 s until 2500s,

Then decrease rate at 1 request every 50 s

ETMI scales up

transparently

Configuration:

• Switch between LPMI

and HPI at data rate

Toverload = 80 KB/s

Tunderload = 53 KB/s

Performance:

• Scales to maximum

HPI performance

• Seamless handover

◦ No broken HTTP

connections

Handover HPI LPMI

Research Group

Energy Efficiency (Static Web Page Scenario)

Idle mode power consumption:

• SoC: 1.85 W PC host: 141.22 W

The SoC area

(left figure)

PC Host SoC

Research Group

Energy Efficiency – Comparison with

Virtualization (Static Web Page Scenario)

Idle mode power consumption:

• SoC: 1.85 W

• PC host: 141.22 W

76 (idle) VMs per host for same energy efficiency

Fair comparison: PC host must serve same load as 76 SoCs

• At 76x4 request/s = 304 request/s:

◦ 76 SoCs: 76 x 1.89W = 143.65 W

◦ PC host: 1 x 184.46 W

• At 76x8 request/s = 608 request/s:

◦ 76 SoCs: 76 x 1.92 W = 145.92 W

◦ PC hosts: 2 x 184.46 W = 368.92 W

Our PC host could only serve max. 300 request/s!!!

energy savings

Research Group

(Closest) Related Work (more see paper)

SoC & host integration

• B.-G. Chun, G. Iannaccone, R. Katz, G. Lee, and L. Niccolini, ACM SIGOPS

Operating Systems Review, 44(1), 2010

◦ Integration of discrete server systems as one design option

◦ Our handover mechanism is one (network centric) technical solution for a

transparent integration

Load balancing mechanisms

• R. Wang, D. Butnariu, and J. Rexford, Hot-ICE 2011

◦ SDN-based approach for keeping TCP connections alive

▪ Approach 1: Re-directs packets to controller (possibly high load on controller)

▪ Approach 2: Timeout heuristic (problem of setting timeout)

◦ We utilize readily available end-system information about connections

◦ We handle dynamic state consistently through firewall “locks”

Research Group

Summary and Future Work

Elastic Tandem Machine

• Concept for transparent integration of SoCs and classic VMs

• Low power consumption at weak load

• Elasticity: Scale up to nominal resources

• SDN-based seamless handover concept

Future work

• Integrating more than two machine types

◦ Micro instance, small instance, large instance, …

• Predictive load/performance models to plan handover in advance

Research Group

Discussion

Full paper:

http://goo.gl/Vkdmfc

Contact:

Frank Dürr

email: frank.duerr@ipvs.uni-stuttgart.de

WWW: http://goo.gl/o6u2A

Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machines (IEEE Cloud 2013)

Technology

Exploring Software Defined Federated Infrastructures for ...datasys.cs.iit.edu/events/ScienceCloud2014/keynote.pdf · – Hybrid cloud infrastructures could integrate private clouds,

StratusLab : Enhancing grid infrastructures with virtualization and cloud technologies

The Role of WAN Optimization in Cloud Infrastructures › ... › JoshTseng_Role_of_WAN_Opt_Cloud.pdf · The Role of WAN Optimization in Cloud Infrastructures ... Importance of the

Transform Data Protection with Cloud and Mobile Infrastructures

Cloud Infrastructures for Enterprises and Governments

2020 NORTH AMERICAN HYBRID CLOUD MANAGEMENT …€¦ · hybrid cloud management providers help ensure that customers’ cloud infrastructures deliver sufficient capacity regardless

Ubuntu cloud infrastructures

Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures

Workflow Adaptation on Networked Cloud Infrastructures ... · PDF fileWorkflow Adaptation on Networked Cloud Infrastructures Using Intent and Performance Models Anirban Mandal, Paul

Infrastructures for Cloud Computing and Big Data M - unibo.it

Enabling Fairness in Cloud Computing Infrastructures

The Role of WAN Optimization in Cloud Infrastructures › ... › cloud › JoshTseng_The_Role_of_WAN_Opt_… · The Role of WAN Optimization . in Cloud Infrastructures . oJ sh Tseng,

Monitoring Large-scale Cloud Infrastructures with OpenNebula

Cloud Computing Open source cloud infrastructures Keke Chen

Private cloud on Cisco Integrated Infrastructures with Cisco UCS Director

Self-Adaptive Cloud Infrastructures with Bidirectional Programming

Phenomenology Tools on Cloud Infrastructures using ... · arXiv:1212.4784v2 [cs.DC] 17 Mar 2013 arXiv:1211.4784 [cs.DC] Phenomenology Tools on Cloud Infrastructures using OpenStack

Integrating Puppet with Cloud Infrastructures-Remco Overdijk

LEIT (ICT7 + ICT8): Cloud strategy - Cloud R&I: Heterogeneous cloud infrastructures, federated cloud networking; cloud innovation platforms; - PCP for

Groupe Kardol - ERP, cloud et infrastructures, solutions web