32
Sogang University Distributed Computing & Communication Lab. Stub Domain Device Model Domain and PV-GRUB Kwon-yong Lee Kwon-yong Lee Distributed Computing & Communication Lab. Distributed Computing & Communication Lab. (URL: http://dcclab.sogang.ac.kr) (URL: http://dcclab.sogang.ac.kr) Dept. of Computer Science Dept. of Computer Science Sogang University Sogang University Seoul, Korea Seoul, Korea Tel : +82-2-3273-8783 Tel : +82-2-3273-8783 Email : [email protected] Email : [email protected]

12393401222009.04.02.ppt

Embed Size (px)

Citation preview

Page 1: 12393401222009.04.02.ppt

Sogang University Distributed Computing & Communication Lab.

Stub Domain

Device Model Domain and PV-GRUB

Kwon-yong LeeKwon-yong Lee

Distributed Computing & Communication Lab.Distributed Computing & Communication Lab.(URL: http://dcclab.sogang.ac.kr)(URL: http://dcclab.sogang.ac.kr)

Dept. of Computer Science Dept. of Computer Science Sogang UniversitySogang University

Seoul, KoreaSeoul, Korea

Tel : +82-2-3273-8783Tel : +82-2-3273-8783 Email : [email protected] : [email protected]

Page 2: 12393401222009.04.02.ppt

2

Domain0 Disaggregation

Big Dom0 Problems Running a lot of Xen components

• Physical device drivers• Domain manager• Domain builder• ioemu device models• PyGRUB

Security issues• Most of the components run as root.

Scalability issues• The hypervisor can not itself schedule them appropriately.

Goal Move the components to separate domains Helper domains

• Driver domain, Builder domain, Device model domains, etc.

Page 3: 12393401222009.04.02.ppt

PyGRUB

Acts as a “PV bootloader”Allows to boot from a kernel that resides within the DomU disk or partition imageNeeds to be root to access guest disk Security issues

Can’t network bootRe-implements GRUB

3

Xen Hypervisor

Dom0

PV Domainxend

Linux

PyGRUB

menu.lstvmlinuzinitrd

Page 4: 12393401222009.04.02.ppt

4

Mini-OS

A sample PV guest for the Xen hypervisor Very simple

Completely rely on the hypervisor to access the machine• Uses the Xen network, block, and console frontend/backend

mechanism

Supports only• Non-preemptive threads• One virtual memory address space (no user space)• Single CPU (mono-VCPU)

Page 5: 12393401222009.04.02.ppt

5

Mini-OS

Xen 3.3 It has been extended up to being able to run the newlib

C library and the lwIP stack, thus providing a basic POSIX environment, including TCP/IP networking.

xen-3.3.1/extras/mini-os/

PS) being tested at Cisco for IOS

Page 6: 12393401222009.04.02.ppt

xen-3.3.1/extras/mini-os/README

6

Minimal OS----------This shows some of the stuff that any guest OS will have to set up.This includes: * installing a virtual exception table * handling virtual exceptions * handling asynchronous events * enabling/disabling async events * parsing start_info struct at start-of-day * registering virtual interrupt handlers (for timer interrupts) * a simple page and memory allocator * minimal libc support * minimal Copy-on-Write support * network, block, framebuffer support * transparent access to FileSystem exports (see tools/fs-back)

- to build it just type make.- to build it with TCP/IP support, download LWIP 1.3 source code and type

make LWIPDIR=/path/to/lwip/source- to build it with much better libc support, see the stubdom/ directory- to start it do the following in domain0 (assuming xend is running)

# xm create domain_configThis starts the kernel and prints out a bunch of stuff and then once every second the system time.If you have setup a disk in the config file (e.g. disk = [ 'file:/tmp/foo,hda,r' ] ), it will loop reading it. If that disk is writable (e.g. disk = [ 'file:/tmp/foo,hda,w' ] ), it will write data patternsand re-read them.If you have setup a network in the config file (e.g. vif = [''] ), it will print incoming packets.If you have setup a VFB in the config file (e.g. vfb = ['type=sdl'] ), it will show a mouse with which you can draw color squares.If you have compiled it with TCP/IP support, it will run a daytime server on TCP port 13.

Page 7: 12393401222009.04.02.ppt

7

POSIX Environment on top of Mini-OS

Xen Hypervisor

Mini-OS

New lib lwIPAdditional Code

getpid, sig, mmap, …

Application

Sched MMConsolefronten

d

Network

frontend

Blockfronten

d

FSfronten

d

FBfronten

d

Page 8: 12393401222009.04.02.ppt

8

POSIX Environment on top of Mini-OS

lwIP (lightweight IP) Provides a lightweight TCP/IP stack

• Just connect to the network frontend of Mini-OS Widely used open source TCP/IP stack designed for

embedded systems Reduce resource usage while still having a full scale TCP

PS) uIP TCP/IP stack for 8-bit microcontrollers

Page 9: 12393401222009.04.02.ppt

9

POSIX Environment on top of Mini-OS

newlib Provides the standard C library functions Or GNU libc

Others getpid and similar return e.g. 1.

• Don’t have the notion of Unix process sig functions can be void.

• Don’t have signals either mmap is only implemented for one case.

• Anonymous memory

Page 10: 12393401222009.04.02.ppt

10

POSIX Environment on top of Mini-OS

Disk frontend

FrameBuffer frontend

FileSystem frontend (to access part of the Dom0 FS) Through the FileSystem frontend/backend mechanism

• Imported from JavaGuest– By using very simple virtualized kernel, JavaGuest project avoids

all the complicated semantics of a full-featured kernel, and hence permit far easier certification of the semantics of the JVM.

More advanced MM Read-only memory CoW for zeroed pages

Page 11: 12393401222009.04.02.ppt

POSIX Environment on top of Mini-OS

Running a Mini-OS example 1 초에 한번씩 타임스탬프가 출력 Xm create –c domain_config 해당 도메인과의 콘솔 연결을 끊으려면 ‘ Ctrl+]’

Cross-compilation environment binutils, gcc, newlib, lwip Ex) ‘Hello World!’

• xen-3.3.1/stubdom/c/

11

Page 12: 12393401222009.04.02.ppt

Old HVM Device Model (< Xen 3.3)

Modified version of qemu, ioemu To provide HVM domains with virtual hardware

Used to run in dom0 as a root process, since it needs to directly access disks and tap network

Problems• Security

– The qemu code base was not particularly meant to be safe

• Efficiency– When an HVM guest performs an I/O operation, the hypervisor

gives hand to Dom0, which then may not schedule the ioemu process immediately, leading to uneven performances.

12

Page 13: 12393401222009.04.02.ppt

Old HVM Device Model

Have to wait for Dom0 Linux to schedule qemu

Consume Dom0 CPU time

13

Xen Hypervisor

Dom0 HVMDomain

IN/OUT

qemu

Linux

Page 14: 12393401222009.04.02.ppt

Xen 3.3.1 (compared to 3.2)

Power management (P & C states) in the hypervisor HVM emulation domains (qemu-on-minios) for better scalability, performance and security PVGrub: boot PV kernels using real GRUB inside the PV domain Better PV performance: domain lock removed from pagetable-update paths Shadow3: optimizations to make this the best shadow pagetable algorithm yet, making HVM performance better than ever Hardware Assisted Paging enhancements: 2MB page support for better TLB locality CPUID feature leveling: allows safe domain migration across systems with different CPU models PVSCSI drivers for SCSI access direct into PV guests HVM frame-buffer optimizations: scan for frame-buffer updates more efficiently Device pass-through enhancements Full x86 real-mode emulation for HVM guests on Intel VT: supports a much wider range of legacy guest OSes New qemu merge with upstream development Many other changes in both x86 and IA64 ports

14

Page 15: 12393401222009.04.02.ppt

HVM Device Model Domain (Xen 3.3 Feature)

In Xen 3.3, ioemu can be run in a Stub Domain.

Dedicated Device Model Domain for each HVM domain

Device Model Domain• Processes the I/O requests of the HVM guest• Uses the regular PV interface to actually perform disk and

network I/O

15

Page 16: 12393401222009.04.02.ppt

16

Stub Domain

Helper domains for HVM guest Because the emulated devices are processes in Dom0,

their execution time is accounted to Dom0.• An HVM guest performing a lot of I/O can cause Dom0 to

use an inordinate amount of CPU time, preventing other guests from getting their fair share of the CPU.

Each HVM guest would have its own stub domain, responsible for its I/O.

• Small stub domains run nothing other than the device emulators.

Based on Mini-OS

xen-3.3.1/stubdom/

Page 17: 12393401222009.04.02.ppt

17

Stub Domain

Tricky scheduling The current schedulers in Xen are based on the

assumption that virtual machines are, for the most part, independent.

• If domain 2 is under-scheduled, this doesn’t have a negative effect on domain 3.

With HVM and stub domain pairs,• The HVM guest is likely to be performance-limited by the

amount of time allocated to the stub domain.• In case where the stub domain is under-scheduled, the HVM

domain sits around waiting for I/O.

Potential solutions• Doors• Scheduler domains

Page 18: 12393401222009.04.02.ppt

18

Stub Domain

Doors From the Spring operating system and later Solaris IPC mechanism

• Allows a process to delegate the rest of its scheduling quantum to another

• The stub domain would run whenever the pair needed to be scheduled.

• It would then perform pending I/O emulation and “delegate” scheduler operation (instead of “yield”) on the HVM guest, which would then run for the remainder of the quantum.

Page 19: 12393401222009.04.02.ppt

19

Stub Domain

Scheduler domains Proposed by IBM based on work in the Nemesis

Exokernel Similar conceptually to the N:M threading model

• The hypervisor’s scheduler would schedule this domain, and it would be responsible for dividing time amongst the others in the group.

• In this way, the scheduler domain fulfills the same role as the user-space component of an N:M threading library.

Page 20: 12393401222009.04.02.ppt

HVM Device Model Domain

Almost unmodified qemuRelieve Dom0Provides better CPU usage accountingMore efficient Let the hypervisor schedule it directly More lightweight OS

A lot safer

20

Xen Hypervisor

stubdom HVMDomain

IN/OUT

qemu

Mini-OS

Dom0

LinuxPV

Page 21: 12393401222009.04.02.ppt

HVM Device Model Domain

Performance lnb : latency of I/O port accesses

• The round trip time between the application in the HVM domain and the virtual device emulation part of qemu

21

Page 22: 12393401222009.04.02.ppt

HVM Device Model Domain

Disk performance

22

CPU %

Page 23: 12393401222009.04.02.ppt

HVM Device Model Domain

Network performance• e1000

23

Page 24: 12393401222009.04.02.ppt

HVM Device Model Domain

Network performance• bicore

24

Page 25: 12393401222009.04.02.ppt

PV-GRUB

PyGRUB used to act as a “PV bootloader”PV-GRUB Real GRUB source code recompiled against Mini-OS Runs inside the PV domain that will host the PV guest

Boot inside PV domain Detect the PV disks and network interfaces of the domain Use that to access the PV guests’ menu.lst Use the regular PV console to show the GRUB menu Use the PV interface to load the kernel image from the

guest disk image

More secure that PyGRUB Just only uses the resources that the PV guest will use

25

Page 26: 12393401222009.04.02.ppt

PV-GRUB

Start

26

Page 27: 12393401222009.04.02.ppt

PV-GRUB

Loading

27

Page 28: 12393401222009.04.02.ppt

PV-GRUB

Loaded

28

kexec (kernel execution)

Allows “live” booting of a new kernel over the currently running one

Page 29: 12393401222009.04.02.ppt

PV-GRUB

29

Page 30: 12393401222009.04.02.ppt

PV-GRUB

Executes upstream GRUB Replace native drivers with Mini-OS drivers Add PV-kexec implementation

Just uses the target PV guest resources

Improve securityProvides network boot

30

Page 31: 12393401222009.04.02.ppt

Reference

Samuel Thibault, Citrix/Xensource, “Stub Domains: A Step Towards Dom0 Disaggregation”

Samuel Thibault, and Tim Deegan, “Improving Performance by Embedding HPC Applications in Lightweight Xen Domains”, HPCVIRT’08, Oct. 2008.

“The Definitive Guide to the Xen Hypervisor”

http://blog.xen.org Xen 3.3 Features: Stub Domains Xen 3.3 Features: HVM Device Model Domain Xen 3.3 Features: PV-GRUB

31

Page 32: 12393401222009.04.02.ppt

HVM Configuration

Para-virtualization Hypercall

HVM (hardware virtualized machine) Hardware support is needed to trap privileged

instructions. Trap-and-emulate approach Processor flag

• vmx : virtual machine extensions – Intel CPU• svm : support vector machine – AMD CPU

In Intel’s VT architecture• Use VMexit and VMentry operations → a lot of costs

32