42
Parallelizing Live Migration of Virtual Machines Xiang Song Jicheng Shi, Ran Liu, Jian Yang, Haibo Chen IPADS of Shanghai Jiao Tong University Fudan University

Parallelizing Live Migration of Virtual Machines

Embed Size (px)

DESCRIPTION

Parallelizing Live Migration of Virtual Machines. Xiang Song Jicheng Shi, Ran Liu, Jian Yang, Haibo Chen IPADS of Shanghai Jiao Tong University Fudan University. Virtual Clouds. Live VM Migration. VM Migration is T ime -consuming. Live VM migration becomes time-consuming - PowerPoint PPT Presentation

Citation preview

Page 1: Parallelizing  Live Migration of Virtual Machines

Parallelizing Live Migration of Virtual Machines

Xiang Song

Jicheng Shi, Ran Liu, Jian Yang, Haibo ChenIPADS of Shanghai Jiao Tong University

Fudan University

Page 2: Parallelizing  Live Migration of Virtual Machines

Virtual Clouds

Page 3: Parallelizing  Live Migration of Virtual Machines

Live VM Migration

Page 4: Parallelizing  Live Migration of Virtual Machines

VM Migration is Time-consuming

Live VM migration becomes time-consumingIncreasing resources of a VMLimited resources of migration tools

Migrating a memcached server VM on Xen 4Gbyte vs. 16 Gbyte

400s

1592s

Migration Time

80s

257s

Downtime Time

Page 5: Parallelizing  Live Migration of Virtual Machines

VM Migration Insight

Example: Migrating a memcached VM with 16 Gbyte memory on Xen

Migration Time 1592s

Data transfer

1200s

Map guest mem

381s

Others

VM Memory Size 16.0Gbyte

Data Transfer 49.3GbytePre-copy

9.3GbyteDowntime

Avg. CPU Usage 95.4%

Page 6: Parallelizing  Live Migration of Virtual Machines

VM Migration Insight

A lot of memory dirtied during pre-copyDirty rateTransfer rate

Improving the transfer rateCPU preparing rateNetwork bandwidth

Page 7: Parallelizing  Live Migration of Virtual Machines

Parallelizing Live VM Migration

With increasing amount of resources Opportunities to leverage resources for parallelizing live VM migration

We design and implement PMigrateLive Parallel MigrationParallelize most basic primitives of migration

Data parallelismPipeline parallelism

Page 8: Parallelizing  Live Migration of Virtual Machines

Contributions

A case for parallelizing live VM migration

The range lock abstraction to scale address space mutation during migration

The design, implementation and evaluation of PMigrate on Xen and KVM

Page 9: Parallelizing  Live Migration of Virtual Machines

Outline

Design of PMigrateChallenges for PMigrateImplementation Evaluation

Page 10: Parallelizing  Live Migration of Virtual Machines

Analysis of Live VM Migration:Source Node

Get/Check Dirty Bitmap Handle Data Transfer Data Transfer

CPU/Device

Enter iteration

Memory Data

Map Guest VM Memory

Handle Zero/PT Page

Disk Data Load Disk Data

Page 11: Parallelizing  Live Migration of Virtual Machines

Analysis of Parallelism

Data ParallelismNo dependency among different portions of data

E.g., mapping guest VM memory

Pipeline ParallelismWhen data parallelism is not appropriate

E.g., check disk/memory dirty bitmap

Page 12: Parallelizing  Live Migration of Virtual Machines

Analysis of Parallelism

Data Pipeline CostCheck dirty bitmap smallMap guest VM memory heavyHandle Unused/PT Page modestTransfer memory data heavyRestore memory data heavyTransfer disk data heavyLoad/Save disk data heavy

Page 13: Parallelizing  Live Migration of Virtual Machines

Send consumer

Send consumer

Send consumer

PMigration: Source Node

Memory DataProducer

Disk DataProducer Task Pool

Pipeline Parallelism

Data parallelism

Page 14: Parallelizing  Live Migration of Virtual Machines

PMigration: Destination Node

Send consumer

Send consumer

Send consumer

Receive consumer

Receive consumer

Receive consumer Disk

Writer

Data Parallelism

Pipeline parallelism

Page 15: Parallelizing  Live Migration of Virtual Machines

Outline

Design of PMigrateChallenges for PMigrateImplementation Evaluation

Page 16: Parallelizing  Live Migration of Virtual Machines

Challenge: Controlling Resource Usage

Parallel VM Migration operationsConsume more CPU/Network resources

Problem: Lower the side-effect

Solution: Resource usage control

Page 17: Parallelizing  Live Migration of Virtual Machines

Resource Usage Control:Network

Daemon thread Monitor network usage of each NIC

Migration processAdjust network usage of each NICReserve some bandwidth for migration

Page 18: Parallelizing  Live Migration of Virtual Machines

Resource Usage Control:CPU & Memory

CPU Rate ControlDepend on VMM scheduling

[L. Cherkasova et.al. PER 07]Control the priority of the migration process

Memory rate controlMaintain a memory pool for pipeline

stages

Page 19: Parallelizing  Live Migration of Virtual Machines

Challenge: Scaling Address Space

MutationHow a memory task is handled?Map a range of address space

Map target guest VM memory

Process memory

Unmap the address space

sys_mmap (...) down_write(mmap_sem); map_address_space() up_write(mmap_sem);

privcmd_ioctl_mmap_batch(...) ... down_write(mmap_sem); vma = find_vma(mm, m.addr); ... ret = traverse_pages(...); up_write(mmap_sem);

sys_munmap (...) down_write(mmap_sem); unmap_address_space() up_write(mmap_sem);

47.94% of time in migrating 16 GByte memory VM with 8 consumer threads

Page 20: Parallelizing  Live Migration of Virtual Machines

First Solution:Read Protecting Guest VM Map

When map target guest memoryHolding mmap_sem in write mode is too costlyIt is not necessary

The mmap_sem can be hold in read modeprivcmd_ioctl_mmap_batch can be done in parallel

Page 21: Parallelizing  Live Migration of Virtual Machines

Range Lock

There are still serious contentionsMutation to an address space is serializedGuest VM memory map contents with mutations

Range lockA dynamic lock-service to the address space

Page 22: Parallelizing  Live Migration of Virtual Machines

Range Lock Mechanism

Skip list based lock serviceLock an address range ([start, start + length])

Accesses to different portions of the address space can be parallelized

Page 23: Parallelizing  Live Migration of Virtual Machines

Range Locksys_mmap(): Down_write(mmap_sem) Obtain the address to map Lock_range(addr, len) Update /add VMAs Unlock_range(addr, len) Up_write(mmap_sem)

sys_mremap(): Down_write(mmap_sem) Lock_range(addr, len) Do remap Unlock_range(addr, len) Up_write(mmap_sem)

munmap(): Down_write(mmap_sem) Adjust first and last VMA Lock_range(addr, len) Detach VMAs Up_write(mmap_sem)

Cleanup page tableFree pages

Unlock_range(addr, len)

guest_map: Down_read(mmap_sem) Find VMA Lock_range(addr, len) Up_read(mmap_sem)

Map guest page through hypercalls Unlock_range(addr, len)

Page 24: Parallelizing  Live Migration of Virtual Machines

Outline

Design of PMigrateChallenges for PMigrateImplementation Evaluation

Page 25: Parallelizing  Live Migration of Virtual Machines

Implementing PMigrate

Implementation on XenBased on Xen tools of Xen 4.1.2 & Linux 3.2.6

Range lock: 230 SLOCsPMigrate: 1860 SLOCs

Implementation on KVMBased on qemu-kvm 0.14.0

KVM migration: 2270 SLOCs

Page 26: Parallelizing  Live Migration of Virtual Machines

Implementing KVM

Vanilla KVM takes iteration-oriented pre-copyHandle 2 MByte data per iterationThe qemu daemon shared by guest VM and migration process

PMigrate-KVM takes image-oriented pre-copyHandle whole memory/disk image per iterationSeparate migration process from qemu daemon

Page 27: Parallelizing  Live Migration of Virtual Machines

Outline

Design of PMigrateChallenges for PMigrateImplementation Evaluation

Page 28: Parallelizing  Live Migration of Virtual Machines

Evaluation Setup

Conducted on two Intel machineTwo 1.87 Ghz Six-Core Intel Xeon E7 chips32 GByte memoryOne quad-port Intel 82576 Gigabit NICOne quad-port Broadcom Gigabit NIC

Page 29: Parallelizing  Live Migration of Virtual Machines

Workload

Idle VMMemcached

One gigabit network connectionThroughput:

Xen 27.7 MByte/s KVM 20.1 MByte/s

In paperPostgreSQLDbench

Page 30: Parallelizing  Live Migration of Virtual Machines

Idle VM Migration - Xen

Total Memory Send (Gbyte)

Migration Time (s)

16.2

422.8 112.4

16.2

Network Usage(Mbyte/s) 39.3 148.0

Vanilla PMigrate

Page 31: Parallelizing  Live Migration of Virtual Machines

Idle VM Migration - KVM

Total Data Send (Gbyte)

Migration Time (s)

16.4

203.9 57.4

16.4

Network Usage(Mbyte/s) 84.2 294.7

Vanilla PMigrate

Page 32: Parallelizing  Live Migration of Virtual Machines

Memcached VM Migration - Xen

Total Memory Send (Gbyte)

Migration Time (s)

58.6

1586.1 160.5

22.7

Network Usage(Mbyte/s) 38.0 145.0

Vanilla PMigrate

Non-response Time (s) 251.9 < 1

Memory Send Last iter (Gbyte)

9.2 0.04

Server Thr. 74.5% 65.4%

Page 33: Parallelizing  Live Migration of Virtual Machines

Memcached VM Migration - Xen

Total Data Send (Gbyte)

Migration Time (s)

35.3

348.7 140.2

39.5

Network Usage(Mbyte/s) 90.7 289.1

Vanilla PMigrate

Non-response Time (s) 163 < 1

Server Thr. 13.2% 91.6%

Page 34: Parallelizing  Live Migration of Virtual Machines

Scalability of PMigrate-Xen

Migrating Idle VM

1 2 4 80

100

200

300

400

500

600w/o optread lockrange lock

Num of consumer threads

Mig

ratio

n tim

e (s

ecs)

112.4

122.92

197.4

149.3

Page 35: Parallelizing  Live Migration of Virtual Machines

Conclusion

A general design of PMigrate by leveraging data/pipeline parallelismRange lock to scale address space mutationImplemention for both Xen and KVMEvaluation Results

Improve VM migration performanceReduce overall resource consuming in many cases

Page 36: Parallelizing  Live Migration of Virtual Machines

36

Institute of Parallel and Distributed Systemshttp://ipads.se.sjtu.edu.cn/

Thanks

PMigrateParallel Live VM

Migration

Questions?

?http://ipads.se.sjtu.edu.cn/pmigrate

Page 37: Parallelizing  Live Migration of Virtual Machines

Backups

Page 38: Parallelizing  Live Migration of Virtual Machines

Load Balance – Network

• Experimental setup– Co-locate a Apache VM – thr. 101.7 MByte/s– Migrate a idle VM with 4 Gbyte memory– Migration process use two NICs (share one NIC

with Apache VM)• Result– Thr. during migration 91.1 MByte/s– Migration speed 17.6 MByte/s + 57.2 MByte/s

Page 39: Parallelizing  Live Migration of Virtual Machines

Load Balance – CPU

• Experimental setup (Xen)– 1 memcached server and 1 idle server• 4 GByte memory• 4 VCPU scheduled on 4 physical CPU

– Migrating the idle server• PMigrate-Xen spawns 4 consumer threads• PMigrate-Xen only share spare physical CPU• Force PMigrate-Xen share all physical CPU

Page 40: Parallelizing  Live Migration of Virtual Machines

Load Balance – CPU

• Memcached Server workload– One Gigabit Network connection– Throughput: 48.4 MByte/s– CPU consumption: about 100%

Page 41: Parallelizing  Live Migration of Virtual Machines

Load Balance – CPU

• Results– PMigrate-Xen perfer spare cpu

Vanilla PMigrate-XenShare PCPU Work Spare Work SpareTotal Time 116s 131s 39s 41sAvg. Memcached Throughput

2.9 MByte/s

23.3 Mbyte/s

6.2 MByte/s

16.9 MByte/s

Avg. Throughput Lost

45.5MByte/s

25.1MByte/s

42.2MByte/s

31.5MByte/s

Total Throughput Lost

5276.2MByte

3291MByte

1637MByte

1293MByte

Page 42: Parallelizing  Live Migration of Virtual Machines

Related work