25
VDN: Virtual Machine Image Distribution Network for Cloud Data Centers IEEE INFOCOM 2012 Orlando, Florida USA Chunyi Peng 1 , Minkyong Kim 2 , Zhe Zhang 2 , Hui Lei 2 1 University of California, Los Angeles 2 IBM T.J. Watson Research Center

VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

  • Upload
    judson

  • View
    48

  • Download
    7

Embed Size (px)

DESCRIPTION

Chunyi Peng 1 , Minkyong Kim 2 , Zhe Zhang 2 , Hui Lei 2 1 University of California, Los Angeles 2 IBM T.J. Watson Research Center. VDN: Virtual Machine Image Distribution Network for Cloud Data Centers. IEEE INFOCOM 2012 Orlando, Florida USA. Cloud Computing. - PowerPoint PPT Presentation

Citation preview

Page 1: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

IEEE INFOCOM 2012

Orlando, Florida USA

Chunyi Peng1, Minkyong Kim2, Zhe Zhang2, Hui Lei2

1University of California, Los Angeles2IBM T.J. Watson Research Center

Page 2: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 2

Cloud Computing

Infocom 2012

the delivery of Computing as a Service

Page 3: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 3

Service Access in Virtual Machine Instances

Infocom 2012

Picture source: http://www.wikimedia.org

Cloud ClientsWeb browser, mobile app, thin client, terminal emulator, …

Software as a Service (SaaS)

CRM, Email, virtual desktop, communications, games, …

Applic

ati

on

Platform as a Service (PaaS)

Execution runtime, database, web server, development tools,

Pla

tform

Infrastructure as a Service (IaaS)

Virtual machines, server storage, load balancer,

networks, …

Infr

ast

ruct

ure

VMVM VMVM VM

Client Service Requests (e.g. HTTP)

Problem: On-demand VM provisioning

Page 4: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

Time for VM Image Provisioning

4C Peng (UCLA)Infocom 2012

VM image transfer

Req process

User request

time

VM Bootup

Our focus: Transfer time

Response in several or tens of minutes in reality!

Page 5: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 5

Why Slow? VM image files are large (several or tens of

GB) Centralized image storage becomes a

bottleneck

Infocom 2012

ToR switch

Access

Data Center

Aggregation

Core

Image-server

RH5.6 RH5.6

Page 6: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

Roadmap Basic VDN idea: enable collaborative sharing VDN solution on efficient sharing

Basic sharing units Metadata management

Performance evaluation Conclusion

Infocom 2012 C Peng (UCLA) 6

Page 7: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

VDN: Speedup VM Image Distribution Enable collaborative sharing

Utilize the “free” VM images Exploit source diversity and make full use of

network bandwidth

ToR switchAccess

Aggregation

Core

Image-server

RH5.6 RH6.0RH5.6RH5.5 RH5.6

7C Peng (UCLA)Infocom 2012

RH5.6

Page 8: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 8

How to Enable Collaborative Sharing? What is the basic data unit for sharing?

File-based sharing: Allow sharing only among same files

Chunk-based sharing: Allow sharing of common chunks from different files

How to manage content location information? Centralized solution: directory service, etc. Distributed solution: P2P overlay, etc.

Infocom 2012

Page 9: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 9

What is the Appropriate Sharing Unit? Two factors

The number of the same, alive VM image instances

The similarity of different VM images

Conduct real trace analysis and cross-image similarity measurement VM traces from six operational data centers for 4

months VM images including different Linux/Windows

versions, IBM services (DB2, Rational, WebSphere) etc

Infocom 2012

Page 10: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 10

VM Instance Popularity The distribution of image popularity is highly

skewed A few popular images take a large portion of VM

instances Many unpopular images have a small number of

VM instances (< 5) Few peers can involve in file-based

sharing

Infocom 2012

Unpopular VM images

Page 11: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 11

VM Instance Lifetime

Infocom 2012

The lifetime of VM instance varies 40% instances (more popular VM instances) <

13 minutes The unpopular VM images have longer lifetime

VM image distribution network should cope with various lifetime instances 13

min

Page 12: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 12

VM Image Structure

Infocom 2012

Tree-based VM image structure

Linux

Windows

Services

Misc

(60%)

(25%)

(11%)

(4%)

Red Hat

SUSE

(53%)

…… ……

Enterprise Linux v5.5 (32bit) (26.6%)Enterprise Linux v5.5 (64bit) (18.7%)…Enterprise Linux v5.4 (32bit) (4%)…Enterprise Linux v5.6 (32bit) (0.2%)

……

Database …… IDE …………

……

V7.0 B (0.7%)V7.0.0.11 S P (0.7%)V7.0.0.11 R B (0.3%)V7.0.0.11 S B (0.3%)V7.0.0.11 S D (0.2%)V7.0 P (0.1%)

(7%)

Web app. server

Page 13: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 13

VM Image Similarity High similarity across VM images

Chunk schemes: fixed size and Rabin fingerprinting Similarity: Sim(A,B) = |A’s chunks that appear in B| /|

A| Chunk-based sharing can exploit cross-image

similarity

Infocom 2012

Page 14: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 14

Enable Chunk-based Sharing Decouple VM images into VM chunks

Exploit similarity across VM images Provide a higher source diversity and sharing

opportunity

Infocom 2012

RH5.6 RH6.0RH5.6 RH5.6 RH5.6RH5.5

Questions:How to maintain chunk location information (metadata)How to be scalable and also enable fast data transmission

Page 15: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 15

How to Manage Location Information? Solution I: centralized metadata server

Cons: be simple Pros: bottleneck at metadata server

Solution II: P2P overlay network, e.g., DHT Cons: distributed operations Pros: be unaware of data center topology and

may introduce high network overhead

Infocom 2012

Internet

I-S

Page 16: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 16

Issues in Conventional P2P Practice

Infocom 2012

One logic operation(lookup/publish)

Multiple physical hops

Hop costs (e.g. time) can be high!

Solution:Reduce # of hopsReduce the cost of physical hops

Keep it local or with close buddies

Page 17: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 17

Topology-aware Metadata Management Divide all the hosts into different-level

hierarchies and manage chunks in each hierarchy Utilize static/quasi-static (controlled) topology Exploit high bandwidth local links in hierarchical

structure

Infocom 2012

Internet

I-SL1

HHH

L1 L1

L2 L2

L3

Page 18: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 18

VDN: Encourage Local Communication Local chunk metadata storage

Index nodes maintain only metadata within this hierarchy Unnecessary to maintain a global view at all index nodes

Local chunk metadata operation (e.g., lookup/publish) Ask close index nodes first Lower operation overhead

Local chunk data delivery Enable high bandwidth transmission between close hosts

(e.g. within the rack)

Infocom 2012

Page 19: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 19

VDN Operation Flows Recursive operation from lower-hierarchy to

higher-hierarchy

Infocom 2012

L2 L2

Image-server

Local Cache

L3 A. Metadata updateB. Metadata lookupC. Data transmission

L1 L1 L1L11.

2. 3A.

3B.

3C.

4A 4B

5

Page 20: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

C Peng (UCLA) 20

Performance Evaluation Setting

One-month real trace driven simulation VM image: 128MB~ 8GB Tree topology: 4x 4 x 8 (128 nodes)

Network bandwidth: Static throughput for one physical link Queue-based simulation for multiple transmissions on one link

Schemes Baseline: centralized operation Local: fetch VM chunks from local host if possible VDN: enable collaborative sharing

Infocom 2012

I-S

disk I/O: 1GbpsNet BW: 1Gbps

2Gbps

500Mbps

200Mbps (4-)

(8-nodes)

Page 21: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

Great Speedup on Image Distribution

C Peng (UCLA) 21

at S6,

Infocom 2012

S1 data center S6 data center

VM image size = 4GB

Page 22: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

Scalable to Heavy Traffic Loads Adjust time-of-arrival using factor 1-60

C Peng (UCLA) 22

S6, Median S6, 90th

Infocom 2012

Page 23: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

Low Metadata Management Overhead Compare with three metadata management

schemes Naïve: on-demand topology-aware broadcast Flat: manage metadata in a ring (e.g. DHT, P2P) Topo: topology-aware design (VDN)

Assume the communication cost is 1:4:10 (reverse to bandwidth)

23(a) Number of messages

(b) Communication costC Peng (UCLA)Infocom 2012

Page 24: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

Conclusion VDN is a network-aware P2P paradigm for VM

image distribution Reduce image provisioning time Achieve the reasonable overhead

Chunk-based sharing exploit inherent cross-image similarity

Network-aware operations can optimize the performance in the context of data centers

C Peng (UCLA) 24Infocom 2012

Page 25: VDN: Virtual Machine Image Distribution Network for Cloud Data Centers

THANKs

C Peng (UCLA) 25Infocom 2012