Scale and Performance in the Denali Isolation Kernel

Andrew Whitaker, Marianne Shaw, and Steven D. Gribble

Presented BySteve Rizor

Abstract The Denali isolation kernel is an operating system architecture designed to safely multiplex

a large number of internet services on shared hardware

Allows new services to be “pushed” onto third-party infrastructures, relieving authors from the burden of maintaining physical infrastructure

Exposes a virtual machine abstraction but does not attempt to emulate the underlying hardware precisely

Modifies the virtual architecture to gain scale, performance, and simplicity of implementation

IntroductionWith the proliferation of Internet services comes the

need for hardware solutions – but obviously one machine per service is usually highly inefficient

A large fraction of web services are infrequently accessed, while a small

fraction is frequently accessed.

IntroductionWhy not virtualize all of the

infrequently-accessed services?

If one machine can handle 10,000 requests per hour for one service, why can’t

one machine handle 1 request per hour for 10,000

services?

Making a Case for Isolation Kernels

Many services can already run on one machine – but there is a need for security Isolation not only enables many services to run, but they run without the ability

to affect one another This enables the push of new/untrusted services without the worry of harming

other services It also brings about an interesting experimentation infrastructure – the ability to

deploy wide-area testbeds for network research: thousands of running subjects without the physical machines

Isolation Kernel Design Principles

An isolation kernel is a small-kernel operating system architecture targeted at hosting multiple un-trusted applications that require little data

sharing.

1. Expose low-level resources rather than high-level abstractions.

• High-level abstractions entail significant complexity and typically have a wide API, violating the security principle of economy of mechanism. They also invite “layer below” attacks, in which an attacker gains unauthorized access to a resource by requesting it below the layer of enforcement

2. Prevent direct sharing by exposing only private, virtualized namespaces.

• Little direct sharing is needed across Internet services, and therefore an isolation kernel should prevent direct sharing by conning each application to a private namespace. Memory pages, disk blocks, and all other resources should be virtualized, eliminating the need for a complex access control policy: the only sharing allowed is through the virtual network.

Isolation Kernel Design Principles

An isolation kernel is a small-kernel operating system architecture targeted at hosting multiple un-trusted applications that require little data

sharing.

3. Scalability.

• An isolation kernel designed for internet services must be able to scale up into the thousands on a single machine. As such, the memory footprint (including the kernel metadata) must be minimized. Since the set of all unpopular services won’t fit in memory, the kernel must treat memory as a cache of popular services, swapping inactive services to disk. It will also have a poor hit rate, so there must be rapid swapping to reduce cache miss penalties.

4. Modify the virtualized architecture for simplicity, scale, and performance.

• VMMs such as Disco adhere to the first two principles. They also strive to support legacy operating systems by precisely emulating the physical hardware. In this case, however, deviating from the underlying physical hardware can enhance performance, simplicity, and scalability. The drawback to this is that this removes support for unmodified legacy operating systems.

Delani Isolation KernelWhile the Delani Isolation Kernel looks like a standard VMM:

The virtual machine interface is quite different from most others

The Delani virtual instruction set is a subset of x86, so that most virtual instructions execute directly on the physical processor. x86 VMMs normally have to use binary rewriting and memory protection techniques to virtualize some of the instructions. Since Delani does not support legacy operating systems, those instructions are simply defined to have ambiguous semantics. At worst, the VM will harm only itself. However, such instructions are rarely used, and none are emitted by C compilers such as gcc.

The instruction set also adds an “idle-with-timeout” instruction that relinquishes control to another VM instead of using time in an idle loop, an instruction to terminate the VM, and several virtual registers revealing information about the system.

Delani Isolation Kernel Delani’s virtual machine interface is also different in that the emulated hardware

is not a representation of the physical system: By keeping the emulated devices static, there is no need to poll for hardware. By keeping the devices simple, it reduces the number of programmed I/O instructions used to transmit

or receive a single packet.

Delani uses a round-robin schedule across all the active VMs (those with active threads) and uses a buffered interrupt scheme to prevent thrashing Those VMs which voluntarily give up time via the “idle-with-timeout” instruction are given priority once

the timeout has finished

Each Denali VM is given its own (virtualized) physical 32-bit address space. A VM may only access a subset of this 32-bit address space, the size and range of which is chosen by

the isolation kernel when the VM is instantiated. The kernel itself is mapped into a portion of the address space that the VM cannot access; because of this, we can avoid physical TLB flushes on VM/VMM crossings.

Virtual registers are stored in a page at the beginning of a VM's (virtual) physical address space. This page is shared between the VM and the isolation kernel, avoiding the overhead of kernel traps for register modications. In other respects, the virtual registers behave like normal memory (for example, they can be paged out to disk).

BenchmarksFor testing, since a standard operating system must be modified for use on the Delani Isolation Kernel, a small guest OS was developed based on the virtual machine interface named Ilwaco.

Because of the simplification of the virtual network device, fewer programmed I/O instructions are needed per packet. However, there still needs to be a user/kernel switch for Delani, where there does not need to be one in BSD. Adding a syscall to BSD packets (forcing this user/kernel switch) brings the BSD performance more into line with Delani.

Benchmarks

The performance gains for buffering interrupt requests are quite obvious. Note the performance hit around 800 VMs due to memory demands and excessive paging.

Benchmarks

Using the new instruction, there is a huge performance gain over normal OS-idle loops.

Benchmarks

Even at 800 virtual machines running, there is still an astonishing throughput

The effects of paging are quite obvious – with a larger amount of memory, the cliff can be pushed further out.

Benchmarks

Running the Quate II Linux server on Delani, it is apparent that even with 30 servers (4 clients each), there is no change in latency or reliability. The scheduling algorithm combined with the idle-with-timeout instruction and the buffered interrupts keep the servers running without issues.

References Andrew Whitaker, Marianne Shaw, and Steven D. Gribble, “

Scale and Performance in the Denali Isolation Kernel”, OSDI’02.

Documents

Scale and Performance in the Denali Isolation Kernel