35
@avsm openmirage.org Mirage: OCaml Appliances for Xen Clouds Anil Madhavapeddy, University of Cambridge with a merry crew Haris Rotsos, Balraj Singh, Steven Smith Jon Crowcroft, Steve Hand (University of Cambridge) Richard Mortier (University of Nottingham) Thomas Gazagnaire (OCamlPro) Dave Scott (Citrix Systems R&D) Monday, 27 August 12

Mirage: extreme specialisation of virtual appliances

Embed Size (px)

DESCRIPTION

Infrastructure-as-a-Service compute clouds provide a flexible hardware platform on which customers host applications as a set of appliances, e.g., web servers or databases. Each appliance is a VM image containing an OS kernel and userspace processes, within which applications access resources via traditional APIs such as POSIX. However, the flexibility provided by the hypervisor comes at a cost: the addition of another layer in the already complex software stack which impacts runtime performance, and increases the size of the trusted computing base. Given that modern software is generally written in high-level languages that abstract the underlying OS, we revisit how these appliances are constructed with our Mirage operating system. Mirage supports the progressive specialisation of source code, and gradually replaces traditional OS components with customisable libraries, ultimately resulting in "unikernel" VMs: sealed, fixed-purpose VMs that run directly on the hypervisor. Developers no longer need to become sysadmins, expert in the configuration of all manner of system components, to use cloud resources. At the same time, they can develop their code using their usual tools, only making the final push to the cloud once they are satisfied their code works. As they explicitly link in components that would normally be provided by the host OS, the resulting unikernels are also highly compact: facilities that are not used are simply not included in the resulting microkernel binary. This talk will describe the architecture of Mirage, and show a quick demonstration of how to build a web-server that runs as a unikernel on a standard Xen installation.

Citation preview

Page 1: Mirage: extreme specialisation of virtual appliances

@avsmopenmirage.org

Mirage: OCaml Appliances for Xen Clouds

Anil Madhavapeddy, University of Cambridgewith a merry crew

Haris Rotsos, Balraj Singh, Steven SmithJon Crowcroft, Steve Hand(University of Cambridge)

Richard Mortier (University of Nottingham)Thomas Gazagnaire (OCamlPro)Dave Scott (Citrix Systems R&D)

Monday, 27 August 12

Page 2: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Millions of lines of code & configuration• Why build for clouds as for desktops?• We can simplify!

Modern Stacks are Too Large

Hardware

Processes

OS Kernel

Threads

Application

Hypervisor

Language

Monday, 27 August 12

Page 3: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Millions of lines of code & configuration• Why build for clouds as for desktops?• We can simplify!

Modern Stacks are Too Large

Hardware

Processes

OS Kernel

Threads

Application

Hypervisor

Language

Hardware

Application

Xen

Language

Hardware

Application

POSIX

Language

Monday, 27 August 12

Page 4: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Millions of lines of code & configuration• Why build for clouds as for desktops?• We can simplify!

Modern Stacks are Too Large

Hardware

Processes

OS Kernel

Threads

Application

Hypervisor

Language

Hardware

Application

Xen

Language

Hardware

Application

POSIX

Language

Devel

opm

ent

Deplo

ymen

t

Monday, 27 August 12

Page 5: Mirage: extreme specialisation of virtual appliances

0

100

200

300

400

500

600

Linux 3.2.2

glibc 2.15

Bind 9.9.0

httpd 2.4.2

OpenSSH 6.0p1

Open vSwitch 1.4.0

NOX-zaku

Mirage

Line

s of

cod

e (x

103 )

OCamlC/C++ASM

openmirage.org

How Large is Large?

Monday, 27 August 12

Page 6: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Critical memory bugs still occur regularly in mature software

• CVE-2012-1182 – Samba– RPC code generator overflow– Variable containing buffer length checked independently

of variable used to allocate memory for buffer– Leads to root exploit

• CVE-2012-2110 – OpenSSL– Integer conversion bug mixed with realloc wrappers– Unsigned cast to signed, but realloc’d buffer not clamped– Leads to heap corruption

Why Do We Care?

Monday, 27 August 12

Page 7: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Attack from without, not within: trust hypervisor, but not I/O traffic or other VMs.

• VMs are no longer multi-user, but primarily single-purpose deployments

• …and they are in a multi-tenant datacenter

• …and they are always network connected

The Cloud Threat Model

Monday, 27 August 12

Page 8: Mirage: extreme specialisation of virtual appliances

Mirage Compiler

Hardware

Hypervisor

OS Kernel

User Processes

Language Runtime

Parallel Threads

Application Binary

Mirage Runtime

Hardware

Hypervisor

Application Code

Configuration Filesapplication source codeconfiguration fileshardware architecturewhole-system optimisation

specialisedunikernel}

openmirage.org

Sealed Appliances

Monday, 27 August 12

Page 9: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Static typing– Pragmatic functional language (OCaml)– Large set of libraries provided, some used in XCP already

Key Features of Mirage

Net.Manager.bind (fun mgr dev ! let src = ’any_addr, 53 in Dns.Server.listen dev src zones )

Monday, 27 August 12

Page 10: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Static typing– Pragmatic functional language (OCaml)– Large set of libraries provided, some used in XCP already

• Cooperative concurrency– Wrapped up in Lwt syntax extensions– Threads encapsulated and hidden within typed modules

Key Features of Mirage

let main () = lwt zones = read key "zones" "zone.db" in Net.Manager.bind (fun mgr dev ! let src = ’any_addr, 53 in Dns.Server.listen dev src zones)

Monday, 27 August 12

Page 11: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Static typing– Pragmatic functional language (OCaml)– Large set of libraries provided, some used in XCP already

• Cooperative concurrency– Wrapped up in Lwt syntax extensions– Threads encapsulated and hidden within typed modules

• Library-based operating system– Fully re-entrant functional libraries

• No dynamic loading– Configuration evaluated at compile-time, and sealed– Recompile and redeploy to reconfigure

Key Features of Mirage

Monday, 27 August 12

Page 12: Mirage: extreme specialisation of virtual appliances

Deploy

Xen

μ μ μ μ μ μ

safe device drivers

link time optimisation

ubuild xen-direct

Linux

ELF

FreeBSD

ELFELF

Test

ubuild posix-direct

tuntap+safe I/O stack

x86_64 native code

Develop

ubuild posix-socket

kernel sockets

bytecode VM

Linux

ELF REPL

openmirage.org

Progressive Specialization

Monday, 27 August 12

Page 13: Mirage: extreme specialisation of virtual appliances

Deploy

Xen

μ μ μ μ μ μ

safe device drivers

link time optimisation

ubuild xen-direct

Linux

ELF

FreeBSD

ELFELF

Test

ubuild posix-direct

tuntap+safe I/O stack

x86_64 native code

Develop

ubuild posix-socket

kernel sockets

bytecode VM

Linux

ELF REPL

openmirage.org

Progressive Specialization

• Development under UNIX:

– full debugging environment (bytecode, debugger, gdb)

– can use kernel sockets or tuntap networking and IO to isolate bugs

– interactive REPL for editor integration and other niceties.

Monday, 27 August 12

Page 14: Mirage: extreme specialisation of virtual appliances

Deploy

Xen

μ μ μ μ μ μ

safe device drivers

link time optimisation

ubuild xen-direct

Linux

ELF

FreeBSD

ELFELF

Test

ubuild posix-direct

tuntap+safe I/O stack

x86_64 native code

Develop

ubuild posix-socket

kernel sockets

bytecode VM

Linux

ELF REPL

openmirage.org

Progressive Specialization

• Testing:

– Simulation backend using NS3/MPI

– Emulate your code at scale before deploying it

Monday, 27 August 12

Page 15: Mirage: extreme specialisation of virtual appliances

Deploy

Xen

μ μ μ μ μ μ

safe device drivers

link time optimisation

ubuild xen-direct

Linux

ELF

FreeBSD

ELFELF

Test

ubuild posix-direct

tuntap+safe I/O stack

x86_64 native code

Develop

ubuild posix-socket

kernel sockets

bytecode VM

Linux

ELF REPL

openmirage.org

Progressive Specialization

• Xen output kernel is specialised:

– “operating system” is a set of libraries (e.g Netfront, Blkfront, TCP) linked with small C runtime.

– Configuration files are evaluated at this stage, and image must be recompiled to reconfigure.

– Data files can also be linked in directly, if relatively small.

– Dead-code elimination is across whole OS image.

Monday, 27 August 12

Page 16: Mirage: extreme specialisation of virtual appliances

text$and$data

foreigngrants

reservedby$Xen

OCamlminor$heap

OCamlmajor$heap

IP$headerTCP$headertx$data

4kB

120T

B128T

B64Dbit$virtual$add

ress$sp

ace$

IP$header

rx$dataTCP$header

4kB

4kB

8×512kBsectors

openmirage.org

Simplified Memory Management

Monday, 27 August 12

Page 17: Mirage: extreme specialisation of virtual appliances

openmirage.org

Optional VM Seal Hypercall

• Single address-space and no dynamic loading– W^X address space– Address offsets are randomized at compile-time

• Dropping page table privileges:– Added freeze hypercall called just before app starts– Subsequent page table updates are rejected by Xen.– Exception for I/O mappings if they are non-exec and do

not modify any existing mappings.

• Very easy in unikernels due to focus on compile-time specialisation instead of run-time complexity.

Monday, 27 August 12

Page 18: Mirage: extreme specialisation of virtual appliances

0

20

40

60

80

100

0 0.05 0.1 0.15 0.2Cum

ulat

ive

frequ

ency

(%)

Jitter (ms)

xen-directlinux-nativelinux-pv

openmirage.org

Event Driven co-threads

The microkernel is event-driven, with no pre-emption at all. Graph shows CDF of thread wakeup latency for a Mirage VM running directly on Xen, vs native or PV Linux userspace.

Monday, 27 August 12

Page 19: Mirage: extreme specialisation of virtual appliances

0

1

2

3

4

0 5 10 15 20

Exec

utio

n tim

e (s

)

Number of threads (millions)

linux-pvlinux-nativexen-direct (malloc)xen-direct (extent)

openmirage.org

Single-instance Thread Scaling

Threads are heap allocated values, so benefit from the faster garbage collection cycle in the Mirage Xen version, and the scheduler can be overridden by application-specific needs.

Monday, 27 August 12

Page 20: Mirage: extreme specialisation of virtual appliances

openmirage.org

Appliance Image Size

Appliance Standard Build Dead Code Elimination

DNS 0.449 MB 0.184 MB

Web Server 0.674 MB 0.172 MB

Openflow learning switch 0.393 MB 0.164 MB

Openflow controller 0.392 MB 0.168 MB

All configuration and data compiled into the image by the toolchain, so no separate VBD required beyond the PV kernel.

Live migration is easy and fun :-)

Monday, 27 August 12

Page 21: Mirage: extreme specialisation of virtual appliances

0

1

2

3

8 16 32 64 128256

5121024

20483072

Tim

e (s

)

Memory size (MiB)

linux-pv minimal-linux xen-direct

openmirage.org

Microbenchmarks: Boot Time

• Minimal Linux is a custom initrd which directly calls ioctls to send a UDP packet. The Linux-PV is more realistic.

• The Xen-Direct is a standard Mirage VM, and is limited by dom0 toolstack latency (XCP in this case).

Monday, 27 August 12

Page 22: Mirage: extreme specialisation of virtual appliances

IO Page Pool

get_page make_writebuf channel output_buf reset_buf

Grant Table

Ring

recycle

free

uninitialised

fixed data

mutable data

reference

request response

openmirage.org

Zero-Copy IO Buffer Management

Monday, 27 August 12

Page 23: Mirage: extreme specialisation of virtual appliances

0

200

400

600

800

1000

10 parallel

flows

1 singleflow

Thro

ughp

ut (r

eqs/

s x

103 )

linux-pv, tx & rxlinux-pv tx, xen-direct rxxen-direct tx, linux-pv rx

openmirage.org

Microbenchmarks: TCP

• Simple throughput test with all offload turned off to stress CPU

• Performance bug in TX path– …being fixed (ACKs being sent

out of order)

Monday, 27 August 12

Page 24: Mirage: extreme specialisation of virtual appliances

0 200 400 600 800

1000 1200 1400 1600

1 2 4 8 16 32 64 128 256

512 1024

2048 4096

Thro

ughp

ut (M

iB/s

)

Block size (KiB)

xen-directlinux-pv, direct I/Olinux-pv, buffered I/O

openmirage.org

Microbenchmarks: Block Storage

Fast PCI-E SSD

Monday, 27 August 12

Page 25: Mirage: extreme specialisation of virtual appliances

openmirage.org

Microbenchmarks: Block Storage

• Same Ring API as Net– Unlike POSIX (libaio and so on)– Caching is controlled via libraries, not API

• No reordering in the front-end– How do we know what the backend storage is?

0 10 20 30 40 50 60 70

1 2 4 8 16 32 64 128 256

512 1024

2048 4096

Thro

ughp

ut (M

iB/s

)

Block size (KiB)

Mirage

Linux domUdirect/buffered

Single spindle SATA

Monday, 27 August 12

Page 26: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Mirage is comparable to or exceeds Linux PV performance– I.e., the functional programming language

overheads are offset via single-address space specialization and careful buffer management.

• But what about in some more “realistic” scenarios?– DNS server appliance– HTTP server scaling across 6 vCPUs– OpenFlow Controller and Switch implementations

Microbenchmarks: Summary

Monday, 27 August 12

Page 27: Mirage: extreme specialisation of virtual appliances

0 10 20 30 40 50 60 70 80

100 1000 10000

Thro

ughp

ut (r

eqs/

s x

103 )

Zone size (entries)

Bind9, LinuxNSD, LinuxNSD, MiniOS -ONSD, MiniOS -O3Mirage (no memo)Mirage (memo)

openmirage.org

DNS Server Performance

Monday, 27 August 12

Page 28: Mirage: extreme specialisation of virtual appliances

openmirage.org

DNS Server Performance

let main () = lwt zones = read key "zones" "zone.db" in Net.Manager.bind (fun mgr dev → let src = ’any_addr, 53 in Dns.Server.listen dev src zones )

Monday, 27 August 12

Page 29: Mirage: extreme specialisation of virtual appliances

0 500

1000 1500 2000 2500

Thro

ughp

ut (c

onns

/s)

linux-pv (1 host, 6 vcpus)linux-pv (2 hosts, 3 vcpus)linux-pv (6 hosts, 1 vcpu)xen-direct (6 unikernels)

openmirage.org

Scaling via Multiple Instances

• Apache/Linux vs. Mirage appliance

• Serving single static page

Monday, 27 August 12

Page 30: Mirage: extreme specialisation of virtual appliances

0 20 40 60 80

100 120 140 160 180

Batch Single

Req

uest

s/s

(x 1

03 ) maestronox-fastxen-direct

openmirage.org

Openflow Controller performance

• Openflow controller is competitive with Nox (C++), but much more high-level. Applications can link directly against the switch to route their data.

Monday, 27 August 12

Page 31: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Service domains should be built this way– Debugging and moving from dom0/domU is a lot easier.– Dependencies are explicitly managed.– Resource control can be fine-grained.– Very useful for unit testing Xen features!

• Which language?– Several viable alternatives: HalVM (Haskell), GuestVM

(Java), ErlangOnXen. More code sharing worthwhile?– Protocol implementations take time to mature.– Working on Mirage/XCP integration via “project Windsor”

• Stub domains?– Less practical for stub domains, due to hardware devices.

Turn Linux into a library OS?

Design Discussion

Monday, 27 August 12

Page 32: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Developer Release for Q3 2012– Monolithic repository at http://github.com/avsm/mirage– Adding pkg mgmt: http://github.com/OCamlPro/opam

Mirage Roadmap

$ opam init git://github.com/mirage/opam-repository$ opam remote -add dev git://github.com/mirage/opam-repo-dev$ opam switch mirage-3.12.1-xen$ opam install mirage-www

Monday, 27 August 12

Page 33: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Developer Release for Q3 2012– Monolithic repository at http://github.com/avsm/mirage– Adding pkg mgmt: http://github.com/OCamlPro/opam

• New architectures ongoing:– FreeBSD kernel: http://github.com/pgj/mirage-kfreebsd– Javascript: http://ocsigen.org/js_of_ocaml– ARM/MIPs64/rPI: via FreeBSD kmod– Simulation: NS3/OpenMPI functional simulation

Mirage Roadmap

$ opam init git://github.com/mirage/opam-repository$ opam remote -add dev git://github.com/mirage/opam-repo-dev$ opam switch mirage-3.12.1-xen$ opam install mirage-www

Monday, 27 August 12

Page 34: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Developer Release for Q3 2012– First release will have pure-OCaml implementations of:

• Device drivers (netfront/blkfront/xenstore)• TCP/IPv4 and DHCPv4• HTTP• DNS(SEC)• SSH• OpenFlow (controller/switch)• vchan IPC• 9P :-)• NFS• FAT32• Distributed k/v store: arakoon.org

Mirage Roadmap

Monday, 27 August 12

Page 35: Mirage: extreme specialisation of virtual appliances

openmirage.org

• Online resources:

– http://www.openmirage.org– http://tutorial.openmirage.org

– https://lists.cam.ac.uk/mailman/listinfo/cl-mirage

– (draft in Q4 2012) http://realworldocaml.org

• Offline resources:

– Find me at the hotel pool

Mirage Roadmap

Monday, 27 August 12