Presented by: Michael Cherkassky, Supervised by: Prof. Danny Raz

Presented by: Michael Cherkassky,

Supervised by: Prof. Danny Raz

*Real Time Issues in Live migration of Virtual Machines

*Back to Basics

*Virtual Machine

*Migration

*Live Migration

*Real-Time

“Real Time Issues in Live migration of Virtual Machines”

*Migration

PH - 1 PH - 2

*Migration Method-Naïve

PH - 2PH - 1What was transmitted?

*Migration Method-Wise

PH - 1 PH - 2

*Live Migration Model

*Goals:

*Identify requirements of live migration process

*Motivation for finding a method for ordering the page transfer

*Assumptions

*, where is a memory page of size P

*Available bandwidth for the transfer - b (bps)

*Time needed to transfer a single page - where H is the overhead

*Each page will be accessed for “write” with a probability of during each time frame T

*Algorithm

1. At time - is a set of pages to be transmitted. It is set to be the entire page set used by the VM. ;

2. For do: all the pages in are transferred according to the order specified by ; The transfer ends at pages in are found dirty again;

3. Stop the VM and transfer the last pages, up to migration finishing time using a bandwidth of bps, with .

*Crucial Values

*Down Time

*Overall Migration time:

*Propositions

*The probability of a page that is not dirty at time (start of migration) to become dirty and thus need to be transmitted in the final migration round is:

*The probability of a page that is dirty at time (start of migration) to become dirty and thus need to be transmitted in the final migration round is:

(Where denotes the inverse of the function.)

*Overall Migration Time

*The order of transmission of the pages that minimizes the expected number of dirty pages found at the end of the live migration step must satisfy the following condition:*Conclusion: If the probabilities are lower than the

optimum ordering is obtained for increasing values of the probabilities On the other hand, if the probabilities are greater than the optimum ordering is obtained for decreasing values of the .

*Order of Transmission

*Is it Right?

All pages are equal, but some are more equalProblem: wasteful to transmit at each step

Solution: Wait until the end, when VM is down

Algorithm: among the pages that are found as dirty at start of step k, for a set of pages delay the transmission to when the VM is stopped. Which pages?

(where is a threshold value)

*Reducing migration time

Conclusion:It is possible to achieve a negligible increase in down-time with a substantial decrease of overall migration time.

Problem: Need to know, precisely, for each page .

Solution: Gathering information during run-time.

Problem: Non-negligible overheads.

Solution: LRU

Frequency-based approach

*Down to Earth

*Computational Resources:

*Scheduling Guarantee by the Kernel –

(Q, P)

*Network Resources:

*b needs to be constant

*Possible to reserve

*Resources

*The authors have modified the KVM hypervisor.

*Page Tracing mechanism

*Page accessed are traced within the hypervisor, using a bitmap

*The implementation will exploit this information to modify transfer order

*Does it work?

*Simulations - Virtualized VideoLAN Client (VLC)

*6500 mapped pages (16KB/page)

*Transfer rate of 100MBit/sec.

*8 sec. (!)

*Does it work?

Guaranteed bandwidth of 50 Mbit/s

*Standard vs. LRU

*570 -> 300 (47%) (K=1)

*360 -> 290 (19.4%) (K=3)

*4800 -> 4500 (6.25%) (K=1)

*5500 -> 5000 (9.1%) (K=3)

LRU with delayed transmission

*LRU vs. Improved LRU

*300 -> 220 (27%) (K=3)

*5000 ->4400 (12%) (K=3)

*Conclusions

*It’s possible to minimize downtime and improve QoS with simple page ordering algorithms

*With a certain bandwidth, LRU has been proved as an effective aid for page ordering, achieving good results.

*Further Work needs to be done…

Documents

Presented by: Michael Cherkassky, Supervised by: Prof. Danny Raz