Upload
gordon-cummings
View
214
Download
1
Embed Size (px)
Citation preview
OSE 2011– OS Singularity1
Operating Systems Engineering
Language/OS Co-Design:
“Singularity”
By Dan Tsafrir, 4/5/2011
OSE 2011– OS Singularity2
Singularity
An experimental research OS by Microsoft Since 2003 Lots of high profile publications (last one in 2009) Open source since 2008, can be used by academia Lots of people working on it
Motivation: Official:
Rethink SW stack, as many decisions are from late 1960s & early 1970s (in all OSes they’ve stayed ~ the same)
Speculated: Windows isn’t the best OS you can imagine…
(UNIX-like isn’t that different…)
OSE 2011– OS Singularity3
Goals
Answer the question “What would a SW platform look like if it was designed from
scratch with the primary goal of improved dependability and trustworthiness”?
Attempt to eliminate the following obvious shortcomings of existing systems & SW stacks Widespread security vulnerabilities Unexpected interactions among applications Failure caused by errant
Extensions Plug-ins, add-ons Drivers
Perceived lack of robustness
OSE 2011– OS Singularity4
Strategies to achieve goals
1. Pervasive employing safe programming language Kernel (where possible) & user space Eliminates such defects as buffer overflow Allows reasoning about the code
2. Using sound program verification tools Further guaranties entire classes of programming errors are
eliminated early in the development cycle Impose constraints to make verification feasible
3. Improved system architecture & design Stops propagation of runtime errors at well-defined boundaries => easier to achieve robust & correct behavior
OSE 2011– OS Singularity5
Kernel
Mostly written in type-safe, garbage collected language 90% of the kernel written in Sing# (extension of C#) 6% C++; rest: C, & little assembly snippets
Still, significant unsafe parts Garbage collector (in unsafe Sing#; accounts for 1/2 of
unsafe code); memory management; I/O access subsystems Microkernel structure
Factored services into user space: NIC, TCP/IP (Israeli connection…), file system, disk driver
Still, 192 sys call! (“my appear shockingly large”) But actually, much simpler than number suggests
No sys calls with complex semantics (such as ioctl) No UNIX compatibility => allows to avoid pitfalls (e.g. tocttou)
OSE 2011– OS Singularity6
Kernel – cont.
Provides 3 base abstractions (the architectural foundation) Software-isolated processes (SIPs) Manifest-based programs (MBPs) Contract-based channels
OSE 2011– OS Singularity7
Software-isolated processes (SIPs)
Like ordinary processes Holder of processing resources Provides context for program execution Threads container
One fundamental, somewhat surprising difference The most radical part of the design… They all run in one address space!
(paging turned off, no use of segments) Both kernel and all processes! User code runs with full hardware privileges (CPL=0)!
OSE 2011– OS Singularity8
Why might that be useful?
Performance Fast process switching
No page table to switch No need to invalidate TLBs In fact, can run in real mode (if memory <= 4GB) In which case no need for TLB at all
Fast system calls CALL rather than INT
Fast IPC No need to copy, it’s already in your address space
Direct user-program access to HW E.g., device drivers of the microkernel
OSE 2011– OS Singularity9
Some performance numbers
Cost (in CPU cycles)
APIcall
Thread yield
message ping/pong
Process creation
Singularity 80 365 1,040 388,00FreeBSD 878 911 13,300 1,030,000Linux 437 906 5,800 719,000Windows 627 753 6,340 5,380,00
IPCswitch
OSE 2011– OS Singularity10
But we’re interested in robustness!
Recall that performance is not the main goal Rather, robustness & security Is not using page-table protection consistent with our goal?
For starters, note that Our OSes do use page tables and they’re not very robust… Unreliability very often comes from extensions
Browser plug-ins & add-ons, loadable kernel modules,… HW VM protection is in any case irrelevant in these cases
Can we just do without HW protection? If we can solve the problem of extensions, then yes! It’d be the same solution for processes…
OSE 2011– OS Singularity11
The idea
Extensions (device drivers, new network protocol , plugin) Would employ a different processes Communicate via IPC
The challenge? Prevent evil/buggy processes from writing
To each other To kernel
Be able to cleanly kill/exit Avoid entangling memory spaces Would allow GC to work
The solution: SIPs Software-isolated processes
OSE 2011– OS Singularity12
SIP philosophy: “sealed”
No modifications from outside No shared memory No signals Only IPC
No modification from within No JIT No class loader No dynamically loaded libraries
OSE 2011– OS Singularity13
SIP rules
Only pointer to your own data No pointers into other SIPs No pointers into the Kernel => No sharing despite shared address space!
SIP can allocate pages of memory from kernel Different allocations are not contiguous
OSE 2011– OS Singularity14
Why un-modifiable?
Why is it crucial that SIPs can’t be modified? Can’t even modify themselves…
What are the benefits? No code insertion attacks
E.g., <script>..</script> injected from the outside into a browser SIP will not work
Easier to reason about (prove things, typically statically) Easier to optimize
E.g., inline (“whole program optimization” can nearly eliminate overheads of virtualization in OOP)
E.g., delete unused functions
OSE 2011– OS Singularity15
Why un-modifiable? – cont.
Singularity vs. JVM (isn’t it “safe” too?…) JVM can load classes in runtime
=> Precludes previous slide JVM allows to share memory
=> Entanglement for when killing(With singularity cleanup is much simpler)
JVM allows other ways to sync (other than explicit messages) Typically impossible to reason about locks without knowing
the program’s semantics JVM = one runtime + one GC
But it has been shown that different programs require different GCs
Singularity supplies a few GCs (trusted code)
OSE 2011– OS Singularity16
Why un-modifiable? – cont.
Bottom line Improved performance makes the pervasive use isolation
primitives feasible
OSE 2011– OS Singularity17
Enforcing memory isolation
How to keep SIPs from reading/writing other SIPs? We must make sure that SIP exclusively reads/writes
from/to memory the kernel has given to it
Shall we have compiler generate code to check every access? (“Does this point to memory the kernel gave us?“) Would slow code down significantly
(especially since memory isn't contagious) Actually, we don't even trust compiler!
OSE 2011– OS Singularity18
Enforcing memory isolation – cont.
Overall structure: during installation Compile to bytecode Verify bytecode statically (confirms to SIP constraints + does
not conflict with any other already existing SIP) Compile bytecode to machine code Later, run the verified machine code w/ trusted runtime
Verifier verifies SIP only uses reachable pointers SIP cannot create new pointers
Only trusted runtime can create pointers Thus, if kernel/runtime never supply out-of-SIP pointer
=> Verified SIP can use its memory only
OSE 2011– OS Singularity19
Enforcing memory isolation – cont.
Specifically, the verifier verifies that SIP doesn’t cook up pointers
Only uses pointers given to it (by the runtime) SIP doesn’t change its mind about type
E.g., don’t interpret int as pointer SIP doesn’t use pointer after it was freed
Enforced with (trusted) GC No other tricks exist
…
OSE 2011– OS Singularity20
Enforcing memory isolation – cont.
Bytecode verification seems to do more than Singularity needs E.g. cooking up pointers might be OK, as long as within
SIP's memory Thus, verifier might forbid some programs that might have
been OK on Singularity, no? No.
Benefits of full verification Fast execution, often don't need runtime checks at all
Though some still needed: array bounds, OO casts Prove IPC contracts (a few slides down the road) …
OSE 2011– OS Singularity21
Trusted vs. Untrusted
Trusted SW If it has bugs => can crash Singularity or wreck other SIPs
Untrusted SW If it has bugs => can only wreck itself
Let’s consider some examples Compiler? Compiler output? Verifier? Verifier output? GC?
OSE 2011– OS Singularity22
Price of SW & HW isolation
9
resources that implement processes, such as the MMU or interrupt controllers. These mechanisms have non -trivial performance costs that are largely hidden because there has been no widely used alternative approach for comparison .
To explore the design trade -offs of hardware protection versus software isolation, recent work in Singularity augments SIPs, which provide isolation through language safety and static verification, with protection domains [1] , which can provide an additional level of hardware -based protection around SIPs . The lower run -time cost of SIPs makes their use feasible at a finer granularity than conventional proces ses, but hardware isolation remains valuable as a defense -in -depth against potential failures in software isolation mechanisms or to make available needed address space on 32 -bit machines .
Protection domains are hardware -enforced protection boundaries , wh ich can host one or more SIPs. Each protection domain consists of a distinct virtual address space. The processor’s MMU enforces memory isolation in a conventional manner. Each domain has its own exchange heap, which is used for communications between SIPs within the domain. A protection domain that does not isolate its SIPs from the kernel is called a kernel domain . All SIPs in a kernel domain run at the processor’s supervisor privilege level (ring 0 on the x86 architecture), and share the kernel’s exchang e heap, thereby simplifying transitions and communication betwee n the processes and the kernel . Non -kernel domains run at user privilege level (ring 3 on the x86).
Communication within a protection domain continues to use Singularity’s efficient reference -passing scheme. However, because each protection domain resides in a separate address space, communication across domains requires data copying or copy -on-write page mapping . The message -passing semantics of Singularity channels makes the implementations indistinguishable to application code (except for performance).
A protection domain could, in principle, host a single process containing unverifiable code written in an unsafe language such as C++. Although very useful for running legacy code, w e have not yet explored this possibility. Currently , all code within a protection domain is also contained within a SIP, which continues to provide an isolation and failure containment boundary.
Because multiple SIPs can be hosted within a protection domain, domains can be employed selectively to provide hardware isolation between specific processes, or between the kernel and processes. The mapping of SIPs to protection domains is a run -time decision . A Singularity system with a distinct protection domain for each SI P is analogous to a fully hardware -isolated
microkernel system , such as MINIX 3 [12] (see Figure 4a). A Singularity system with a kernel domain hosting the kernel, device drivers, and services is analogous to a conventional, monolithic operating syste m, but with more resilience to driver or service failures (see Figure 4b). Singularity also supports unique hybrid combinations of hardware and softwa re isolation, such as selection of kernel domains based on signed code (see F igure 4c).
4.2.1 Quantifying the Un safe Code Tax Singularity offers a unique opportunity to quantify the costs of hardware and software isolation in an apples -to-apples comparison . Once the costs are understood, individual systems can choose to use hardware isolation when its benefits outwe igh the costs.
Hardware protection does not come for free, though its costs are dif fuse and difficult to quantify . Costs of hardware protection include maintenance of page tables, soft TLB misses, cross -processor TLB maintenance, hard paging exceptions, a nd the additional cache pressure caused by OS code and data supporting hardware protection . In addition, TLB access is on the critical path of many processor designs [2, 15] and so might affect both processor clock speed and pipeline depth . Hardware protection increases the cost of calls into the kernel and process context switches [3] . O n processors with an untagged TLB, such as most current implementations of the x86 architecture, a process context switch requires flushing the TLB, whi ch incurs refill costs.
Figure 5 graphs the normalized execution time for the WebFiles benchmark in six different configurations of hardware and software isolation . The WebFiles benchmark is an I/O intensive benchmarks based on SPECweb99 . It consists of three SIPs: a
Figure 4a. Micro -kernel configuration (like MINIX 3). Dotted lines mark protection domains; dark domains are user-level, light are kernel -level.
Figure 4b. Monolithic kernel and monolithic applica tion configuration .
Figure 4c. Configuration with distinct policies for signed and unsigned code.
Kernel
WebApp
HTTPServer
WebBrowser
WebPlug -In
FileSystem
TCP/IPStack
DiskDriver
NICDriver
UnsignedDriver
AppSignedPlug -in
SignedDriver Kernel
UnsignedPlug -in
S ignedApp
UnsignedApp
UnsignedDriver
SignedApp
SignedPlug -in
SignedDriver Kernel
UnsignedPlug -in
UnsignedApp
Figure 5. Normalized WebFiles execution time .
-4.7 %+6.3%
+18.9%+33.0% +37.7%
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
No runtimechecks
PhysicalMemory
Add4KB
Pages
AddSeparateDomain
AddRing 3
FullMicrokernel
Safe Code Tax
Unsafe Code Tax
WebFiles benchmark normalized exe time; 3 SIPs: (1) client issues random reads acorsfiles, (2) FS, (3) disk driver; no checks => runtime checks => TLB turned on (same address space) => client separate address space (ring 0) => client ring 3 => all SIPshave separate address space
OSE 2011– OS Singularity23
IPC: contract-based channels
Contract-based Contract defines a state machine
Verifier statically proves it statically at install time Example: Listing 1 of
G. Hunt & J Larus, “Singularity: rethinking the SW stack”OSR 2007 (a survey on singularity)
Homework: read & understand it Very Fast
Zero copy (see slide 9) SIP can only have one pointer to an object (verified)
Though the “exchange stack” Channels are capabilities
E.g., an open file is a channel received from the file server
OSE 2011– OS Singularity24
Manifest-based programs (MBPs)
No code is allowed to run without a “manifest” By default, a SIP can do nothing Manifest describes SIP’s
Capabilities, required resources, dependencies upon other programs
When program/MBP installed, verifying that It meets all required safety properties All its dependencies are met Doesn’t interfere with already-installed MBPs
Example Manifest of driver provides enough information to prove it
will not access HW of previously installed drivers