View
233
Download
0
Embed Size (px)
Citation preview
Improving IPC by Kernel Design
Jochen LiedtkeProceeding of the 14th ACM Symposium on Operating
Systems PrinciplesAsheville, North Carolina
1993
The Performance ofu-Kernel-Based Systems
H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter
Proceedings of the 16th Symposium on Operating Systems Principles
October 1997, pp. 66-77
Jochen Liedtke (1953 – 2001)
• 1977 – Diploma in Mathematics from University of Beilefeld.
• 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles.
• 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel.
The IPC Dilemma
• IPC is a core paradigm of u-kernel architectures• Most IPC implementations perform poorly • Really fast message passing systems are needed to
run device drivers and other performance critical components at the user-level.
• Result: programmers circumvent IPC, co-locating device drivers in the kernel and defeating the main purpose of the microkernel architecture
What to Do?
• Optimize IPC performance above all else!• Results: L3 and L4: second-generation micro-
kernel based operating systems • Many clever optimizations, but no single “silver
bullet”
Summary of Techniques
Seventeen Total
Standard System Calls (Send/Recv)
send ( ); System call, Enter kernel Exit kernel
Client (Sender) Server (Receiver)
receive ( ); System call, Enter kernel Exit kernel
send ( ); System call, Enter kernel Exit kernel
receive ( ); System call, Enter kernel Exit kernel
Client is not Blocked
Kernel entered/exited four times per call!
New Call/Response-based System Calls
call ( ); System call, Enter kernel Allocate CPU to Server Suspend
Re allocate CPU to Client Exit kernel
Client (Sender) Server (Receiver)
Resume from being suspended Exit kernel
reply_and_recv_next ( ); Enter kernel Send Reply Wait for next message
handle message
Special system calls for RPC-style interaction
Kernel entered and exited only twice per call!
reply_and_recv_next ( );
Complex Message Structure
Batching IPC
Combine a sequence of send operations into a single operation by supporting complex messages
• Benefit: reduces number of sends.
Direct Transfer by Temporary Mapping
• Naïve message transfer: copy from sender to kernel then from kernel to receiver
• Optimizing transfer by sharing memory between sender and receiver is not secure
• L3 supports single-copy transfers by temporarily mapping a communication window into the sender.
Scheduling
• Conventionally, ipc operations call or reply & receive require scheduling actions:– Delete sending thread from the ready queue.
– Insert sending thread into the waiting queue
– Delete the receiving thread from the waiting queue.
– Insert receiving thread into the ready queue.
• These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).
Solution, Lazy Scheduling
• Don’t bother updating the scheduler queues!
• Instead, delay the movement of threads among queues until the queues are queried.
• Why?– A sending thread that blocks will soon unblock again, and maybe
nobody will ever notice that it blocked
• Lazy scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks
Pass Short Messages in Registers
• Most messages are very short, 8 bytes (plus 8 bytes of sender id)– Eg. ack/error replies from device drivers or
hardware initiated interrupt messages.
• Transfer short messages via cpu registers.
• Performance gain of 2.4 us or 48%T.
Impact on IPC Performance
• For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement.
• For large message (4K) a 3 fold improvement is seen.
Relative Importance of Techniques
• Quantifiable impact of techniques– 49% means that that removing that item would increase ipc time
by 49%.
OS and Application-Level Performance
OS-Level Performance
Application-Level Performance
Conclusion
• Use a synergistic approach to improve IPC performance– A thorough understanding of hardware/software
interaction is required– no “silver bullet”
• IPC performance can be improved by a factor of 10
• … but even so, a micro-kernel-based OS will not be as fast as an equivalent monolithic OS– L4-based Linux outperforms Mach-based Linux, but
not monolithic Linux