User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and

Henry M. Levy

Presented by Arthur Strutzenberg

Interprocess Communication

• In the LRPC paper/presentation, it discussed the need for– Failure Isolation– Extensibility– Modularity

• Usually a balance between the 3 needs and performance

• This will is a central theme for this paper as well.

Interprocess Communication• Traditionally this is the responsibility of the

Kernel– This suffers from two problems

• Architectural Performance• Interaction between kernel based communication and user

level threads– Generally designers use a pessimistic (non

cooperative) approach• This begs the following question

“How can you have your cake and eat it too?”

Interprocess Communication• What if the Communication

layer is extracted out of the kernel, and made part of the User level

• This can increase performance by allowing– Messages sent between

address spaces directly– Elimination of unnecessary

processor reallocation– Amortization (processor

reallocation (when needed) is spread over several independent calls)

– Parallelism in message passing is exploited

Application

Stub

Kernel

Message ChannelURPC

Fast Threads

Application

Stub

URPC

Fast Threads

User-Level Remote Procedure Call (URPC)

• Allows communication between address spaces without kernel mediation

• Isolates– Processor Reallocation– Thread Management– Data Transfer

• Kernel is ONLY responsible for allocating processors to the address space

URPC & Communication

• Application OS Communication typically is– Narrow Channel (Ports)– Limited Number of Operations

• Create• Send• Receive• Destroy

• Most modern OS have support for RPC


• What does this buy URPC?– RPC is generally limited in definition about

how the channels of communication operate– Also the definition generally does not specify

how processor scheduling (reallocation) will interact with the data transfer


• URPC exploits this information by– Messages passed through logical channels

are kept in memory that is shared between client and server

• This memory once allocated is kept intact– Thread management is User Level

(lightweight instead of “Kernel weight”)

• (Haven’t we read this in another paper?)

URPC & Thread Management

• There is less overhead involved in switching a processor to another thread in the same address space (context switching) versus reallocating it to another thread in a different address space (Processor Reallocation)– URPC uses this along with the user level

scheduler to always give preference to threads within the same address space

URPC & Thread Management

• Some numbers for comparison:

– A context switch within the address space • 15 microseconds

– A processor reallocation• 55 microseconds

URPC & Processor Allocation

• What happens when a client invokes a procedure on a server process and the server has no processors allocated to it?– URPC calls this “underpowered”– The paper identifies this as a load balancing problem– The solution is reallocation from client to server

• A client with an idle processor can elect to reallocate the idle processor to the server

• This is not without cost, as this is expensive and requires a call to the kernel

Rationale for URPC• The design of the

URPC package presented in this paper has three main components– Thread

Management– Data Transfer– Processor

Reallocation

Application

Stub

Kernel

Message ChannelURPC

Fast Threads

Application

Stub

URPC

Fast Threads

Lets kill two birds with one stone

• URPC uses an “optimistic reallocation policy” which makes the following assumptions– The Client will always have other work to do– The server will (soon) have a processor

available to service messages• This leads to the “amortization of cost”

– The cost of a processor reallocation is spread over several calls

Why the optimistic approach doesn’t always hold

• This approach does not work as well when the application – Runs as a single thread– Is Real time– Has high latency I/O– Priority Invocations

• URPC handles this by allowing the client’s address space to force a processor reallocation to the server’s even though there might still be work to do

The Kernel handles Processor Reallocation

• URPC handles this through call called“Processor.Donate”

• This passes control of an idle processor down to the kernel, and then back up to a specified address in the receiving space

Voluntary Return of Processors

• The policy of URPC on its server processors is

“…Upon receipt of a processor from a client address, return the processor when all outstanding messages from the client have generated replies, or when the server determines that the client has

become ‘underpowered’….”

Parallels to the User Threads Paper

• Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This has the potential to lead to some interesting side effects.

• This is extremely similar to some of the problems discussed in the User Threads paper– For example, a server thread could conceivably

continue to hold a donated processor and handle requests from other clients

What this leads to…

• One word: STARVATION– URPC handles this by only directly reallocating

processors to load balance.• In other words, the system also needs the notion

of preemptive reallocation– The Preemptive reallocation must also adhere to

• No higher priority thread waits while a lower priority thread runs

• No processor idles when there is work for it to do (even if the work is in another address space)

Controlling Channel Access

• Data flows in URPC involving different address spaces use a bidirectional shared memory queue. The queues have a test and set lock on either end, which the papers specifically state must be NON SPINNING– The protocol is, if the lock is free, acquire it,

otherwise go on and do something else – Remember this protocol operates under the

assumption that there is always work to do!!

Data Transfer Using Shared Memory

• There is still the risk of what the paper refers to as the “abusability factor” with RPC, where Clients & Servers can– Overload each other– Deny service– Provide bogus results– Violate communication protocols

• URPC passes the responsibility to handle this off to the stubs.

Cross-Address Space Procedure Call and Thread Management

• This section of the paper identifies that there is a correspondence between

Send Receive(messaging)

AndStart Stop

(Threads)

• Does this not remind everybody of a classic paper that we had to read?

Another link to the User Threads Paper

• Additionally the paper identifies three arguments with the thread—message relationship– High performance thread management

facilities are needed for fine-grained parallel programs

– High performance can only be provided at the user level

– The close interaction between communication and thread management can be exploited

URPC Performance

• Some comparisons:(values are in microseconds)

TestURPC Fast

Threads Taos Threads

Ratio of Taos Cost to URPC

Cost

Procedure Call 7 7 1.0

Fork 43 1192 27.7

Fork;Join 102 1574 15.4

Yield 37 57 1.5

Acquire, Release 27 27 1.0

PingPong 53 271 5.1

URPC Performance

• URPC can be broken down into 4 components– Send– Poll– Receive– Dispatch

(values are in microseconds)

Component Client ServerPoll 18 13

Send 6 6

Receive 10 9

Dispatch 20 25

Total 54 53

Call Latency and Throughput• Call Latency is the time from which a thread

calls into the stub until control returns from the stub.

• These are load dependent, and depend on– Number of Client Processors (C)– Number of Server Processors (S)– Number of runnable threads in the client’s Address

Space (T)• The graphs measure how long it takes to make

100,000 “Null” procedure calls into the server in a “tight loop”

Call Latency and Throughput

Conclusions

• In certain circumstances, it makes sense to move the Communication layer from the kernel to user space.

• Most OS’s are designed for a uniprocessor system, and are ported over to an SMMP system.– URPC is one example of a system that is

designed for SMMP directly, and takes direct advantage of the characteristics of the system

Conclusions

• As a lead in to Professor Walpoles Discussion and Q&A, lets conclude by trying to fill out the following table:

RPC Type Similarities DifferencesGeneric RPCLRPCURPC

Documents

User-Level Interprocess Communication for Shared Memory Multiprocessors