Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Virtualization
CSCE678
Announcement
• Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t watched)
• All optional readings are posted If you don’t like your material, let me know
2
VirtualizationBasics
3
Virtualization in Cloud
• Multi-tenancy:splitting hardware resources for multiple tenants
• Security Isolation:tenants are isolated from each other
• Legacy software:tenants run their OSes and applications without modification
• Flexibility of deployment:tenants can be easily deployed, migrated, or removed from a physical host
4
Physical Machine
Physical vs Virtual Machines
5
Virtual Machine
(Guest)OS
Application
Virtual Machine
(Guest)OS
Application
Virtual Machine MonitorPhysical Machine
OS
Application
Physical Machine
OS
Application
Without virtualization With virtualization
Definition of Virtualization
Popek & Goldberg (1974):
• Identical environment:Programs in VMs should have identical effects as running on real machines
• Efficiency: should not add too much overheads
• Resource control:Virtual machine monitor (VMM) has complete control of hardware resources
6
Identicality
7
Physical Machine
OS
Application
Physical Machine
Virtual Machine
(Guest)OS
Application
VMM
Startprogram
ProcessData
Outputresults
Startprogram
ProcessData
Outputresults
Should behave “exactly” the same!(except VM might be slower)
Challenges for Virtualization
• Virtualizing user programs is not hard• A time-sharing OS (i.e., UNIX) will work
• Virtualizing a whole OS is the main challenge• According to Popek & Goldberg (1974),
most architectures such as x86 are not “virtualizable”• VMWare workstation is designed to solve the x86
virtualization problem
8
OS Background
9
What Does An OS Consist Of?
• Where most applications run
• Many spaces in parallel
• Cannot access states of other applications unless allowed
10
User Space
Kernel• Control hardware and resources
• Sharing states (e.g., pipes)
Differentiating User Space & Kernel
• An architecture typically have two protection rings• Ring 0: kernel• Ring 3: user spaces
• Only ring 0 can run privileged instructions• Setting up page tables• Setting up interrupt handlers• Changing user contexts• Assigning segment registers (%cs, %ds, %fs, %gs, etc)• … etc
11
Ring 1 & 2 are mostly not usedor even not supported
Context Switches
12
User Space
(run in ring 3)
Kernel(run in ring 0)
• System calls(syscall instruction)
• Exceptions(e.g., page faults)
• Interrupts (e.g., timers)
iretinstruction
Virtual Memory Management
13
Kernel Application 1 Application 2
Physical Memory
Page Page Page Page Page
Page Page Page Page Page
Demand paging:Allocating virtual memoryas needed.
Virtual Address Physical Address
14
001000111 110000111 111001000 001110011 011011100000PageTable(cr3) …
…
……
……
……
Page
OffsetPhysical Address (PA)= Physical page address + page offset
Virtual address (48 bits in binary)
011011100000
001000111
110000111111001000
001110011
Page Fault Handling
15
001000111 110000111 111001000 001110011 011011100000
Kernel
Trap into kernel
NewPage
Alloc
Upd
ate
Virtual address (48 bits in binary)
PageTable(cr3) …
…
……
……
……
001000111
110000111111001000
001110011
User Space
Invalid
Virtualization:The VMWare Approach
16
Kernel vs VMM
• An OS relies on hardware protections to ensure all resources are controlled by kernel
• VMM has the exact same requirement
How to subvert the control of OS kernel whenVMM is in charge?
17
Trap-and-Emulate
• Any privileged instruction needs to be trapped into VMM before the OS kernel
18
User Space(ring 3)
Kernel(ring 1 or ring 3)
iretinstruction
System calls,Exceptions,or Interrupts
VMM(ring 0)
Trap
Emulateand return
Example: Timer Interrupts
19
GuestKernel
A
GuestKernel
B
Set alarm in 1 second
Set alarm in 3 second
VMM
Trap Trap
1. Set alarm in 1 second on core 1
2. Set alarm in 3 seconds on core 2
Core 1 Core 2
Example: Page Table
20
GuestKernel
Set page table (cr3)
VMM
Trap
Set page table (cr3)for the guest kernel
But what aboutpage table entries?
GuestPageTable …
…
……
……
…
001000111
110000111111001000
001110011 New Page
Shadow Page Table
21
GuestKernel
ShadowPageTable(cr3)
VMM Update
……
……
……
…
001000111
110000111111001000
001110011 New Page
Read-only
Pagefault
Read-only Read-only Read-only
Update
Memory Management
• Requires VMM to trace the changes in the guest page table (i.e., page tracing)
• Pages for memory-mapped IO Read or write to the pages trigger interaction
with I/O devices Also trap-and-emulated by VMM
22
World Switch
23
GuestUser
Space
GuestKernel
VMM
GuestUser
Space
GuestKernel
VMM
GuestUser
Space
GuestKernel
VMM
GuestUser Mode
GuestKernel Mode
VMMMode
Trap Trap
Switch
Switch
VMM injects memory into guest spacefor efficient trapping.
Protectedfrom guest
x86 Was Not Virtualizable
• Popek & Goldberg: all privileged instructions need to be trapped in non-kernel mode (ring > 0)
• Many x86 instructions are not trappable• Example 1: PUSH %cs pushes current protection level on
the stack, so guest kernel can see ring != 0• Example 2: POPF can enable/disable interrupt in ring 0,
but silently ignored in ring > 0• VMM never gets the chance to emulate!
24
Dynamic Binary Translation
25
GuestData
GuestCode
On-diskimage
GuestData
Native Executable
Code
In-memory
VMMmodifies thebinary code
Untrapped privileged instructionscan be either:
(1) Replaced with emulation codewhich originally runs in VMM
(2) Injected with other trappableinstructions (e.g., syscall)
Virtualizing I/O Devices (1/2)• Assigning physical devices to guest kernels pose engineering
and security challenges because guest drivers can’t access devices directly
26
GuestKernel
HDD ETH VGA HDD ETH VGA
VMM
Drivers
GuestKernelDrivers
Virtualizing I/O Devices (1/2)• VMM creates virtual devices to multiplex access to
I/O resources
27
vHDD vETH vVGA vHDD vETH vVGA
VMM
HDD ETH VGA
GuestKernelDrivers
GuestKernelDrivers
Use of A Host Operating System
28
GuestKernel
GuestKernel
Virtual HW Virtual HWHost
OSVMM
Standard emulated devices:SCSI/IDE HDDs, e1000 eth cards,COM1/2 serial ports, VGA cards,keyboards, mouses, etc
Drivers
HDD ETH VGA
A host OS makes emulationmuch easier by relying onexisting OS implementation.
Virtualization:The Hardware Approach
29
Modern Virtualization Solutions
• Virtualization nowadays are assisted by hardware• x86 is now virtualizable• Hardware-assisted paging replaced shadow paging• Virtualization-friendly I/O devices
30
Fixing x86 Virtualization
• Intel VT or AMD-V fulfilled the Popek & Goldberg requirements
31
GuestKernel
(ring 0 non-root)
Trap by privilegedinstructions
VMM(ring 0 root)
VM Exit
VM Enter
VMCSSpecify which instructionscan be trapped(including ones that are previouslyuntrappable, e.g., POPF)
(Virtual MachineControl Structure)
VM Exits
32
GuestKernel
(ring 0 non-root)VMM(ring 0 root) MOV RAX->CR3
VMCS
Switch to VMCS.host_cr3
Guest-specificVMCS
Find out the new CR3 from RAX
Assign to VMCS.guest_cr3
VM Exit
Switch to new CR3
VMRESUMEinstruction
Memory Virtualization (1/3)• Intel VT-x (extended page table) & AMD SVM (nested page table)
33
……
……
…
…
001000111 110000111111001000
001110011
GuestPage Table(CR3)
001000111 110000111 111001000 001110011 011011100000gVA:
011001101 111001110 101000110 000111001 011011100000gPA:
……
……
…
…
011001101 111001110101000110
000111001
ExtendedPage Table
(in VMCS)
hPA
Memory Virtualization (2/3)• Guest can update CR3 or page tables w/o trapping into VMM
34
……
……
…
…
001000111 110000111111001000
001110011
001000111 110000111 111001000 001110011 011011100000gVA:
011001101 111001110 101000110 000111010 011011100000gPA:
……
……
…
…
011001101 111001110101000110
000111010
New Page
GuestPage Table(CR3)
ExtendedPage Table
(in VMCS)
Memory Virtualization (3/3)• VMM only manages the guest physical pages
35
……
……
…
…
001000111 110000111111001000
001110011
001000111 110000111 111001000 001110011 011011100000gVA:
011001101 111001110 101000110 000111010 011011100000gPA:
……
……
…
…
011001101 111001110101000110
000111010
New Page
Invalid
Not your physical page!!!
GuestPage Table(CR3)
ExtendedPage Table
(in VMCS)
I/O Virtualization (1/2)• Intel VT-d and AMD-Vi allow direct I/O to assigned devices Virtual IOMMU (vIOMMU)
36
Guest Kernel
HDD ETH VGA HDD ETH VGA
VMM
Driver
DMA
Driver
DMA
Driver
DMA
Guest Kernel
Driver
DMA
Driver
DMA
Driver
DMA
Setup
Still can’t share devices!
I/O Virtualization (2/2)
• Single-Root I/O Virtualization (SR-IOV):A specification for sharing PCI-e devices with multiple guests
37
Guest Kernel
Driver
Guest Kernel
Driver
SR-IOV supportedethernet card
Presented as multiple devicescontrolled by PCI-evirtual functions (VFs)
VF VF