Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
Multiverse
Multiverse: Automatic Hybridization of Runtime Systems Kyle C. Hale, Conor Hetland, and Peter Dinda | {kh, ch}@u.northwestern.edu, [email protected]
A Hybrid Runtime (HRT) is a transformation of a traditional parallel runtime into a specialized operating system kernel. HRTs enjoy unfettered access to the hardware and determine their own abstractions to that hardware. The Hybrid Virtual Machine (HVM) makes it possible to create VMs that are internally partitioned between a “regular OS” (ROS) and an HRT. They allow the HRT to leverage legacy functionality inside the ROS, and they allow a user to easily create and launch HRTs from the ROS.
Hybrid Runtimes
• HRTs can be very fast, but they require a manual port to kernel mode. This requires domain knowledge at the level of a runtime developer and at the level of a kernel developer
• Even for an experienced kernel developer, porting a complex parallel runtime to kernel-mode is an error-prone process. Porting can be difficult and laborious!
• Much of this functionality is not on critical path!
Why Automatic Hybridization?
[1] K. Hale, C. Hetland, and P. Dinda. Automatic Hybridization of Runtime Systems. In Proceedings of the 25th
International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC ‘16). [2] K. Hale and P. Dinda. Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support. In
Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ‘16).
[3] K. Hale and P. Dinda. A Case for Transforming Parallel Runtimes into Operating System Kernels. In Proceedings
of the 24th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC ‘15).
[4] J. Lange, P. Dinda, K. Hale, L. Xia. An Introduction to the Palacios Virtual Machine Monitor Release 1.3. Tech.
Rep. NWU-EECS-11-10, Dept. of EECS, Northwestern Univ. (2011).
References
Acknowledgements This project is made possible by support from the United States National Science Foundation through grant CCF-1533560 and from Sandia National Laboratories through the Hobbes Project, which is funded by the 2013 Exascale Operating and Runtime Systems Program under the Office of Advanced Scientific Computing Research in the United States Department Of Energy’s Office of Science.
• Racket is the most widely used dialect of Scheme • Includes challenging features typical of a dynamic, high-level language. Many make heavy use of
Linux ABI: system calls, memory mapping, processes, threads, signals, etc. • We automatically hybridize Racket with Multiverse. The user can interact with the Racket REPL in
the standard fashion
Small Performance Overhead
• With Multiverse, when a new HRT context is created, the Aerokernel is booted transparently on a remote set of cores
• Right: The Aerokernel binary is included in the runtime’s executable when compiled with our toolchain
• The boot initialization is requested by the Multiverse runtime layer on the ROS side
Reducing Forwarded Events
• We introduced Multiverse, a system that automatically hybridizes existing runtime systems
• Runtime developers rebuild their system with our toolchain. It can then operate in a state of split execution, where most of the execution occurs in an accelerated, HRT environment
• Multiverse adds little to no overhead, allowing the developer to start with a working system in kernel mode. The developer can then incrementally port legacy functionality to the HRT, reducing the number of events forwarded to the ROS
Summary
split-execution in Multiverse
merged address space
• Merged address space allows HRT to leverage code/data mapped into the ROS virtual address space
• We can, for example, use shared user-space libraries in the HRT that are mapped into the ROS process without implementing dynamic linking functionality in the Aerokernel
• The HRT can operate on data structures that have been constructed in the ROS
• Higher-half addresses (where the kernel code/data is mapped) are distinct for ROS and HRT
Left: Performance of hybridized Racket (with Multiverse) for a set of benchmarks from the Language Benchmark Game compared to Racket in a VM and Racket running on native Linux. Overheads are very small. In all but two cases, the light-weight environment provided by the HRT actually increases performance over Virtual.
Right: The primary source of overhead in Multiverse comes from forwarded events. The two benchmarks above that are made slower from overhead incur most of it from page faults. The figure on the right shows that the overheads for typical forwarded events are roughly 1500 cycles for each event.
• These bars show the benchmarks that perform worse with Multiverse initially
• We introduce a change to the Racket runtime that eagerly faults in pages when mapping large chunks of memory
• This reduces the occurrence of page faults, which in turn reduces the number of events forwarded from HRT to ROS
• Performance of the hybridized version of Racket is now better than virtual
• The point of this exercise is to show that the overheads of Multiverse can be eliminated by reducing the number of forwarded events
General Purpose OS(Linux)
HVM Library
Parallel Runtime
Parallel App
AeroKernel (Nautilus)
HVM Library
Parallel Runtime
Parallel App
merged address space
VMM
accelerated HRT
(1)
(2)
(3)
(4)
ROS HRT
Main thread
Partner thread
HRT thread
Nested HRT
thread
(1)
(2) (3)
(4)(5)
libs
AeroKernel binary
stack
Virtual address spacefor control process (ROS core)
libs.text.data
heap AeroKernel-managedmemory
HRT core physical memory
Loaded AeroKernel
Multibootand
VMM info structures
ROS Kernel
(Linux)
Application + Runtime Code and
Data
0xffff800000000000
0xffffffffffffffff
0x00007fffffffffff
0x0000000000000000
Canonical “lower half”
Canonical “higher half”
Application + Runtime Code and
Data
ROS Virtual Address Space
HRT Virtual Address Space
Physical Address Space
HRT Private
ROS + HRT Shared
ROS + HRT Shared
HRT Private
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
mmap
munmap
Late
ncy
(cyc
les)
VirtualMultiverse
0
10
20
30
40
50
fannk
uch-r
edux
binary
-tree-2 fas
tafas
ta-3
nbod
y
spec
tral-n
orm
mande
lbrot-
2
Run
time
(s)
NativeVirtual
Multiverse
0
10
20
30
40
50
60
nbod
y
spec
tral-n
orm
Run
time
(s)
Native-prefaultVirtual-prefault
Multiverse-prefault
• In Multiverse, the runtime begins execution in the ROS. The runtime creates an HRT context through either explicit or implicit invocations
• Once an HRT context is created, the system is in a state of split execution
• During split execution, exceptional events on the HRT side (page faults, system calls, and some others) are forwarded to the ROS
• Each HRT execution context is paired with a partner thread on the ROS, which handles events forwarded over event channels. HRT contexts with their partner threads comprise execution groups
• Nested threads share event channels with their parent
• HRT contexts are (by default) created whenever a new pthread is created in the runtime
Parallel&App&
Parallel&Run,me&
General&Kernel&
Node&HW&
User%Mode%
Kernel%Mode%
Parallel&App&
Hybrid&Run,me&(HRT)&
Node&HW&
User%Mode%
Kernel%Mode%
Parallel&Run,me&
General&Kernel&
Node&HW&
User%Mode%
Kernel%Mode%
Parallel&App&
Hybrid&Run,me&(HRT)&
User%Mode%
Kernel%Mode%
Hybrid&Virtual&Machine&(HVM)&
Specialized&Virtualiza,on&Model&
Performan
ce*Path*
Parallel&App&
Legacy*Path*
(a) Current Model (b) Hybrid Runtime Model
(c) Hybrid Runtime Model Within a Hybrid Virtual Machine
Performan
ce*Path*
General&Virtualiza,on&Model&
• We showed in previous work that by porting a legacy parallel runtime to an HRT environment, we can increase the performance of a real parallel runtime system by as much as 40% [2, 3]
• The HRT is composed of the runtime and a thin kernel framework layer called an Aerokernel
• Aerokernels are designed to be simple, light-weight, and very fast. We designed and implemented the Nautilus Aerokernel, which is used in conjunction with Multiverse
overhead added for forwarded system calls
~1500 cycles
interactions within an execution group
Aerokernel boot process
Building an Aerokernel to support a parallel runtime system (manual port to HRT)
addfunc(on rebuild boot works?
no
yesdone
http://nautilus.halek.co
fast path on HW
fast path undervirtualization
Needaneasierwaytogofromlegacyrun4mesystemtoHRT+HVM-capablerun4me