Upload
judson
View
48
Download
7
Tags:
Embed Size (px)
DESCRIPTION
Chunyi Peng 1 , Minkyong Kim 2 , Zhe Zhang 2 , Hui Lei 2 1 University of California, Los Angeles 2 IBM T.J. Watson Research Center. VDN: Virtual Machine Image Distribution Network for Cloud Data Centers. IEEE INFOCOM 2012 Orlando, Florida USA. Cloud Computing. - PowerPoint PPT Presentation
Citation preview
VDN: Virtual Machine Image Distribution Network for Cloud Data Centers
IEEE INFOCOM 2012
Orlando, Florida USA
Chunyi Peng1, Minkyong Kim2, Zhe Zhang2, Hui Lei2
1University of California, Los Angeles2IBM T.J. Watson Research Center
C Peng (UCLA) 2
Cloud Computing
Infocom 2012
the delivery of Computing as a Service
C Peng (UCLA) 3
Service Access in Virtual Machine Instances
Infocom 2012
Picture source: http://www.wikimedia.org
Cloud ClientsWeb browser, mobile app, thin client, terminal emulator, …
Software as a Service (SaaS)
CRM, Email, virtual desktop, communications, games, …
Applic
ati
on
Platform as a Service (PaaS)
Execution runtime, database, web server, development tools,
…
Pla
tform
Infrastructure as a Service (IaaS)
Virtual machines, server storage, load balancer,
networks, …
Infr
ast
ruct
ure
VMVM VMVM VM
Client Service Requests (e.g. HTTP)
Problem: On-demand VM provisioning
Time for VM Image Provisioning
4C Peng (UCLA)Infocom 2012
VM image transfer
Req process
User request
time
VM Bootup
Our focus: Transfer time
Response in several or tens of minutes in reality!
C Peng (UCLA) 5
Why Slow? VM image files are large (several or tens of
GB) Centralized image storage becomes a
bottleneck
Infocom 2012
ToR switch
Access
Data Center
Aggregation
Core
Image-server
RH5.6 RH5.6
Roadmap Basic VDN idea: enable collaborative sharing VDN solution on efficient sharing
Basic sharing units Metadata management
Performance evaluation Conclusion
Infocom 2012 C Peng (UCLA) 6
VDN: Speedup VM Image Distribution Enable collaborative sharing
Utilize the “free” VM images Exploit source diversity and make full use of
network bandwidth
ToR switchAccess
Aggregation
Core
Image-server
RH5.6 RH6.0RH5.6RH5.5 RH5.6
7C Peng (UCLA)Infocom 2012
RH5.6
C Peng (UCLA) 8
How to Enable Collaborative Sharing? What is the basic data unit for sharing?
File-based sharing: Allow sharing only among same files
Chunk-based sharing: Allow sharing of common chunks from different files
How to manage content location information? Centralized solution: directory service, etc. Distributed solution: P2P overlay, etc.
Infocom 2012
C Peng (UCLA) 9
What is the Appropriate Sharing Unit? Two factors
The number of the same, alive VM image instances
The similarity of different VM images
Conduct real trace analysis and cross-image similarity measurement VM traces from six operational data centers for 4
months VM images including different Linux/Windows
versions, IBM services (DB2, Rational, WebSphere) etc
Infocom 2012
C Peng (UCLA) 10
VM Instance Popularity The distribution of image popularity is highly
skewed A few popular images take a large portion of VM
instances Many unpopular images have a small number of
VM instances (< 5) Few peers can involve in file-based
sharing
Infocom 2012
Unpopular VM images
C Peng (UCLA) 11
VM Instance Lifetime
Infocom 2012
The lifetime of VM instance varies 40% instances (more popular VM instances) <
13 minutes The unpopular VM images have longer lifetime
VM image distribution network should cope with various lifetime instances 13
min
C Peng (UCLA) 12
VM Image Structure
Infocom 2012
Tree-based VM image structure
Linux
Windows
Services
Misc
(60%)
(25%)
(11%)
(4%)
Red Hat
SUSE
(53%)
…… ……
Enterprise Linux v5.5 (32bit) (26.6%)Enterprise Linux v5.5 (64bit) (18.7%)…Enterprise Linux v5.4 (32bit) (4%)…Enterprise Linux v5.6 (32bit) (0.2%)
……
Database …… IDE …………
……
V7.0 B (0.7%)V7.0.0.11 S P (0.7%)V7.0.0.11 R B (0.3%)V7.0.0.11 S B (0.3%)V7.0.0.11 S D (0.2%)V7.0 P (0.1%)
(7%)
Web app. server
C Peng (UCLA) 13
VM Image Similarity High similarity across VM images
Chunk schemes: fixed size and Rabin fingerprinting Similarity: Sim(A,B) = |A’s chunks that appear in B| /|
A| Chunk-based sharing can exploit cross-image
similarity
Infocom 2012
C Peng (UCLA) 14
Enable Chunk-based Sharing Decouple VM images into VM chunks
Exploit similarity across VM images Provide a higher source diversity and sharing
opportunity
Infocom 2012
RH5.6 RH6.0RH5.6 RH5.6 RH5.6RH5.5
Questions:How to maintain chunk location information (metadata)How to be scalable and also enable fast data transmission
C Peng (UCLA) 15
How to Manage Location Information? Solution I: centralized metadata server
Cons: be simple Pros: bottleneck at metadata server
Solution II: P2P overlay network, e.g., DHT Cons: distributed operations Pros: be unaware of data center topology and
may introduce high network overhead
Infocom 2012
Internet
I-S
C Peng (UCLA) 16
Issues in Conventional P2P Practice
Infocom 2012
One logic operation(lookup/publish)
Multiple physical hops
Hop costs (e.g. time) can be high!
Solution:Reduce # of hopsReduce the cost of physical hops
Keep it local or with close buddies
C Peng (UCLA) 17
Topology-aware Metadata Management Divide all the hosts into different-level
hierarchies and manage chunks in each hierarchy Utilize static/quasi-static (controlled) topology Exploit high bandwidth local links in hierarchical
structure
Infocom 2012
Internet
I-SL1
HHH
L1 L1
L2 L2
L3
C Peng (UCLA) 18
VDN: Encourage Local Communication Local chunk metadata storage
Index nodes maintain only metadata within this hierarchy Unnecessary to maintain a global view at all index nodes
Local chunk metadata operation (e.g., lookup/publish) Ask close index nodes first Lower operation overhead
Local chunk data delivery Enable high bandwidth transmission between close hosts
(e.g. within the rack)
Infocom 2012
C Peng (UCLA) 19
VDN Operation Flows Recursive operation from lower-hierarchy to
higher-hierarchy
Infocom 2012
L2 L2
Image-server
Local Cache
L3 A. Metadata updateB. Metadata lookupC. Data transmission
L1 L1 L1L11.
2. 3A.
3B.
3C.
4A 4B
5
C Peng (UCLA) 20
Performance Evaluation Setting
One-month real trace driven simulation VM image: 128MB~ 8GB Tree topology: 4x 4 x 8 (128 nodes)
Network bandwidth: Static throughput for one physical link Queue-based simulation for multiple transmissions on one link
Schemes Baseline: centralized operation Local: fetch VM chunks from local host if possible VDN: enable collaborative sharing
Infocom 2012
I-S
disk I/O: 1GbpsNet BW: 1Gbps
2Gbps
500Mbps
200Mbps (4-)
(8-nodes)
Great Speedup on Image Distribution
C Peng (UCLA) 21
at S6,
Infocom 2012
S1 data center S6 data center
VM image size = 4GB
Scalable to Heavy Traffic Loads Adjust time-of-arrival using factor 1-60
C Peng (UCLA) 22
S6, Median S6, 90th
Infocom 2012
Low Metadata Management Overhead Compare with three metadata management
schemes Naïve: on-demand topology-aware broadcast Flat: manage metadata in a ring (e.g. DHT, P2P) Topo: topology-aware design (VDN)
Assume the communication cost is 1:4:10 (reverse to bandwidth)
23(a) Number of messages
(b) Communication costC Peng (UCLA)Infocom 2012
Conclusion VDN is a network-aware P2P paradigm for VM
image distribution Reduce image provisioning time Achieve the reasonable overhead
Chunk-based sharing exploit inherent cross-image similarity
Network-aware operations can optimize the performance in the context of data centers
C Peng (UCLA) 24Infocom 2012
THANKs
C Peng (UCLA) 25Infocom 2012