CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XVIII: Concluding Remarks

CMPT 401 Summer 2007

Dr. Alexandra Fedorova

Lecture XVIII: Concluding Remarks

2CMPT 401 Summer 2007 © A. Fedorova

Outline

• Discuss A Note on Distributed Computing by Jim Waldo et al.

• Jim Waldo: – Distinguished Engineer at Sun Microsystems– Chief architect of Jini– Adjunct professor at Harvard


A Note on Distributed Computing

• Distributed computing is fundamentally different from local computing

• The two paradigms are so different that it would be very inefficient to try and make them look the same– You’d end up with distributed applications that aren’t robust to

failures– Or with local applications that are more complex than they need

to be• Most programming environments for DS attempt to mask

the difference between local and remote invocation– But this is not what’s hard about distributed computing…


Key Argument

• Achieving interface transparency in distributed systems is unreasonable– Distributed systems have different failure modes than local

systems– Handling those failures properly requires a certain interface– Therefore, distributed systems must be accessed via

different interfaces– Those interfaces would be an overkill for local systems


Differences Between Local and Distributed Applications

• Latency• Memory access• Partial failure and concurrency


Latency

• A remote method call takes longer to execute than a local method call

• If you build your application without taking this into account, you are doomed to have performance problems

• Suppose you disregard local/remote differences:– You build/test your application using local objects– You decide later which objects are local and which are remote– You find out that if frequently accessed objects are remote, your

performance sucks


Latency (cont.)

• One way to overcome the latency problem:– Make available tools that will allow developer to debug

performance– Understand what components are slowing down the system– Make recommendations about the components that should

be local• But can we be sure that such tools would be available?

(Do you know of a good one?) This is an active research area – this means that this is hard!


Memory Access

• A local pointer does not make sense in a remote address space

• What are the solutions? – Create a language where all memory access is managed by a

runtime system (i.e., Java) – everything is a reference• But not everyone uses Java

– Force the programmer to access memory in a way that does not use pointers (in C++ you can do both)

• But not all programmers are well behaved


Memory Access and Latency: The Verdict

• Conceptually, it is possible to mask the difference between local and distributed computing w.r.t. memory access and latency

• Latency: – Develop your application without consideration for object

locations– Decide on object locations later– Rely on good debugging tools to determine the right location

• Memory access– Enforce memory access though the underlying management

system• But masking this difference is difficult, and so it’s not clear whether

we can realistically expect it to be masked


Partial Failure

• One component has failed others keep operating• You don’t know how much of the computation has

actually completed – this is unique to distributed systems– Has the server failed or is it just slow?– Did it update my bank account before it failed?

• With local computing, a function can also fail, or a system may block or deadlock, but– You can always find out what’s happening by asking the operating

system or the application– In distributed computing, you cannot always find out what

happened, because you may be unable communicate with the entity in question


Concurrency

• Aren’t local multithreaded applications subject to same issues as distributed applications?

• Not quite:– In local programming, a programmer can always force a certain

order of operations – In distributed computing this cannot be done– In local programming, the underlying system provides

synchronization primitives and mechanisms– In distributed systems, this is not easily available, and the system

providing the synchronization infrastructure may fail


So What Do We Do?

• Design the right interfaces• Interfaces must allow the programmer to handle errors that

are unique to distributed systems• For example: a read() system call:

– Local interface: int read(int fd, char *buf, int size)

– Remote interface:int read(int fd, char *buf, int size, long timeout)

Error codes are expanded to indicate timeout or network failure


But Wait… Can’t You Unify Interfaces

• Can’t you use the beefed-up remote interface even when programming local applications?

• Then you don’t need to have different sets of interfaces• You could, but

– Local programming would become a nightmare– This defeats the purpose of unifying local and distributed

paradigms: instead of making distributed programming simpler you’d be making local programming more complex


So What Does Jim Suggest?

• Design objects with local interfaces• Add an extension to the interface if the object is to be

distributed• The programmer will be aware of the object’s location• How is this actually done? Recall RMI:

– A remote object must implement Remote interface– A method invoked on a remote object must catch Remote

exception– But the same object can be used locally, without specifying that it

implements Remote


Summary

• Distributed computing is fundamentally different from local computing because of different failure modes

• By making distributed interfaces look like local interfaces, we are diminishing our ability to properly handle those failures – this results in brittle applications

• To handle those failures properly, interfaces must be designed in a certain way

• Therefore, remote interfaces must be different from local interfaces (unless you want to make local interfaces unnecessarily complicated)

Documents

CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XVIII: Concluding Remarks