Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating...

Preview:

Citation preview

Improving IPC by Kernel Design

Jochen LiedtkeProceeding of the 14th ACM Symposium on Operating

Systems PrinciplesAsheville, North Carolina

1993

The Performance ofu-Kernel-Based Systems

H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Proceedings of the 16th Symposium on Operating Systems Principles

October 1997, pp. 66-77

Jochen Liedtke (1953 – 2001)

• 1977 – Diploma in Mathematics from University of Beilefeld.

• 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles.

• 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel.

The IPC Dilemma

• IPC is a core paradigm of u-kernel architectures• Most IPC implementations perform poorly • Really fast message passing systems are needed to

run device drivers and other performance critical components at the user-level.

• Result: programmers circumvent IPC, co-locating device drivers in the kernel and defeating the main purpose of the microkernel architecture

What to Do?

• Optimize IPC performance above all else!• Results: L3 and L4: second-generation micro-

kernel based operating systems • Many clever optimizations, but no single “silver

bullet”

Summary of Techniques

Seventeen Total

Standard System Calls (Send/Recv)

send ( ); System call, Enter kernel Exit kernel

Client (Sender) Server (Receiver)

receive ( ); System call, Enter kernel Exit kernel

send ( ); System call, Enter kernel Exit kernel

receive ( ); System call, Enter kernel Exit kernel

Client is not Blocked

Kernel entered/exited four times per call!

New Call/Response-based System Calls

call ( ); System call, Enter kernel Allocate CPU to Server Suspend

Re allocate CPU to Client Exit kernel

Client (Sender) Server (Receiver)

Resume from being suspended Exit kernel

reply_and_recv_next ( ); Enter kernel Send Reply Wait for next message

handle message

Special system calls for RPC-style interaction

Kernel entered and exited only twice per call!

reply_and_recv_next ( );

Complex Message Structure

Batching IPC

Combine a sequence of send operations into a single operation by supporting complex messages

• Benefit: reduces number of sends.

Direct Transfer by Temporary Mapping

• Naïve message transfer: copy from sender to kernel then from kernel to receiver

• Optimizing transfer by sharing memory between sender and receiver is not secure

• L3 supports single-copy transfers by temporarily mapping a communication window into the sender.

Scheduling

• Conventionally, ipc operations call or reply & receive require scheduling actions:– Delete sending thread from the ready queue.

– Insert sending thread into the waiting queue

– Delete the receiving thread from the waiting queue.

– Insert receiving thread into the ready queue.

• These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).

Solution, Lazy Scheduling

• Don’t bother updating the scheduler queues!

• Instead, delay the movement of threads among queues until the queues are queried.

• Why?– A sending thread that blocks will soon unblock again, and maybe

nobody will ever notice that it blocked

• Lazy scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks

Pass Short Messages in Registers

• Most messages are very short, 8 bytes (plus 8 bytes of sender id)– Eg. ack/error replies from device drivers or

hardware initiated interrupt messages.

• Transfer short messages via cpu registers.

• Performance gain of 2.4 us or 48%T.

Impact on IPC Performance

• For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement.

• For large message (4K) a 3 fold improvement is seen.

Relative Importance of Techniques

• Quantifiable impact of techniques– 49% means that that removing that item would increase ipc time

by 49%.

OS and Application-Level Performance

OS-Level Performance

Application-Level Performance

Conclusion

• Use a synergistic approach to improve IPC performance– A thorough understanding of hardware/software

interaction is required– no “silver bullet”

• IPC performance can be improved by a factor of 10

• … but even so, a micro-kernel-based OS will not be as fast as an equivalent monolithic OS– L4-based Linux outperforms Mach-based Linux, but

not monolithic Linux

Recommended