Upload
kaleem-ullah
View
7
Download
2
Tags:
Embed Size (px)
DESCRIPTION
nice lect
Citation preview
June 24, 2009 CS-524(NED) Lec 01 2
CS-524 Distributed Computer Systems
M. Engg. (Computer Systems) Fall Semester – 2009
Instructor: Shahab Tahzeeb (Assistant Professor)Department of Computer & Information Systems Engineering
NED University of Engineering & Technology, Karachi
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
June 24, 2009 CS-524(NED) Lec 01 3
Today’s Agenda
• Getting to know each other• Describing our roles to make this course a
real success • Overview of the Course
June 24, 2009 CS-524(NED) Lec 01 4
My Role
Continuously strive to expose you to the subject knowledge in a manner that helps save your time in getting hold of details
June 24, 2009 CS-524(NED) Lec 01 5
Your Role
• Continuously strive to be regular in every aspect– schedule some time for review of lectures before
coming to the class– Take sessional work seriously– Ask questions. There are NO stupid questions– Learning-centered approach
• You learn as well as earn good grade
– Grading-centered approach• You may get good grade but you never learn
June 24, 2009 CS-524(NED) Lec 01 6
Academic Calendar• 9 weeks Teaching
– 22nd June, 2009 to 22nd August, 2009
• 5 weeks (Ramazan/Eid Break)– 24th August, 2009 to 26th September, 2009
• 7 weeks Teaching– 26th September, 2009 to 14th November, 2009
• Final Examinations– 1st December, 2009 to 15th December, 2009
• Results’ Declaration– Last week of December, 2009
June 24, 2009 CS-524(NED) Lec 01 7
Books• Andrew S. Tanenbaum and Maarten van Steen
Distributed Systems: Principles and ParadigmsPrentice Hall
• George Coulouris, Jean Dollimore and Tim KindbergDistributed Systems: Concepts and DesignPearson Education
June 24, 2009 CS-524(NED) Lec 01 8
Topics• Introduction• Communication• Processes• Naming• Synchronization• Consistency and Replication• Fault Tolerance• Security
* We shall add topics to this list if time permits
June 24, 2009 CS-524(NED) Lec 01 9
Course Objectives• Describe fundamental concepts of and techniques in distributed
systems• Analyze distributed systems according to desired qualities (such as
performance, reliability, or availability)• Apply distributed systems techniques (such as Remote Procedure
Call, event-based communication, or transactions) to implement distributed system designs
• Compare and contrast concepts of and techniques in distributed systems with respect to their ability to fulfill desired qualities
• Design distributed systems according to desired qualities by choosing among introduced concepts and techniques
June 24, 2009 CS-524(NED) Lec 01 10
Grading • Quizzes 05%
– 3 announced quizzes • weeks 3, 6 and 12
– 2 surprise quizzes • 2 announced and 1 surprise quiz will be graded
• Homework 05%• Class Participation 05%• Term Paper 05%• Mid-Term (09th Week) 10%• Final 70%• No early or makeup exams please!
June 24, 2009 CS-524(NED) Lec 01 11
Web Group for Course Management
http://groups.yahoo.com/group/cs524-09B
June 24, 2009 CS-524(NED) Lec 01 15
Definition # 1
• A collection of independent computers that act as an integrated system and hence appear to the end user as a single computer (i.e. a virtual uniprocessor)
• Two aspects– Hardware: autonomous machines– Software: users think they’re dealing with a
single system
June 24, 2009 CS-524(NED) Lec 01 16
Definition # 1
• User’s view of a Distributed System:– Multiple computers that work together in a more or
less seamless fashion (single system image)• To support heterogeneous computers and
networks and still present a single-system image, systems may rely on middleware: – a software layer that provides a consistent interface to
the user, regardless of the underlying platform.
June 24, 2009 CS-524(NED) Lec 01 17
Definition # 1
A distributed system organized as middleware. The middleware layer runs on all machines, and offers each application the same interface, provides a programming abstraction as well as masking the heterogeneity of the underlying networks, hardware, operating systems and programming languages
June 24, 2009 CS-524(NED) Lec 01 18
CORBA: A Middleware Example
• CORBA is the OMG's open, vendor-independent architecture and infrastructure that computer applications use to work together over networks.
• Using the standard protocol IIOP, a CORBA-based program from any vendor, on almost any computer, operating system, programming language, and network, can interoperate with a CORBA-based program from the same or another vendor, on almost any other computer, operating system, programming language, and network.
June 24, 2009 CS-524(NED) Lec 01 19
Other Middleware Examples
• DCOM– Distributed Component Object Management
• RPC– Remote Procedure Call
• RMI– Remote Method Invocation
June 24, 2009 CS-524(NED) Lec 01 20
ONC RPC
• Open Network Computing Remote Procedure Call, is a widely deployed remote procedure call system.
• ONC was originally developed by Sun Microsystems as part of their Network File System project, and is sometimes referred to as Sun ONC or Sun RPC
June 24, 2009 CS-524(NED) Lec 01 21
Definition # 2
• Enslow:– A distributed system is the one, wherein
hardware, control and data achieve some degree of decentralization and resources’distribution is transparent to the user
June 24, 2009 CS-524(NED) Lec 01 22
Definition # 2• H1. A single CPU with one control unit.• H2. A single CPU with multiple ALUs. There is only one
control unit.• H3. Separate specialized functional units, such as one
CPU with one floating-point coprocessor.• H4. Multiprocessor with single I/O system and a global
memory.• H5. Multicomputer with multiple I/O systems and local
memories.
• C1. Single fixed control point. Note that physically the system may or may not have multiple CPUs.
• C2. Single dynamic control point. In multiple CPU cases the controller changes from time to time among CPUs.
• C3. A fixed master/slave structure. For example, in a system with one CPU and one coprocessor, the CPU is a fixed master and the coprocessor is a fixed slave.
• C4. A dynamic master/slave structure. The role of master/slave is modifiable by software.
• C5. Multiple homogeneous control points where copies of the same controller are used.
• C6. Multiple heterogeneous control points where different controllers are used.
• D1. Centralized databases with a single copy of both files and directory.
• D2. Distributed files with a single centralized directory and no local directory.
• D3. Replicated database with a copy of files and a directory at each site.
• D4. Partitioned database with a master that keeps a complete duplicate copy of all files.
• D5. Partitioned database with a master that keeps only a complete directory.
• D6. Partitioned database with no master file or directory.
Extension to Enslow’s Definition
June 24, 2009 CS-524(NED) Lec 01 24
Definition # 3
• An Intimidating Definition– A distributed system is one in which failure of
a computer you even didn’t know existed can render your own computer unusable(Leslie Lamport)
June 24, 2009 CS-524(NED) Lec 01 25
Examples of Distributed Systems (1)
• Internet• Mobile and Ubiquitous Computing• P2P Systems• Sensor Networks• Distributed Mobile Robots• Air Traffic Control (ATC) System• Banking, Stock Markets, Stock Brokerages• Heath Care, Hospital Automation• Control of Power Plants, Electric Grid• Telecommunications Infrastructure
June 24, 2009 CS-524(NED) Lec 01 26
Examples of Distributed Systems (2)
• Electronic Commerce and Electronic Cash on the Web (very important emerging area)
• Corporate “Information” Base: a company’s memory of decisions, technologies and strategies
• Military Command, Control, and Intelligence Systems• Embedded Systems: automotive control systems
– Mercedes S-Klasse automobiles these days are equipped with 50+ autonomous embedded processors
– Connected through proprietary bus-like LANs
June 24, 2009 CS-524(NED) Lec 01 27
Distributed System vs. Network
• There’s no or little coordination among networked machines
• Users are aware of separate machines in a network while a distributed system operates in a seamless fashion.
June 24, 2009 CS-524(NED) Lec 01 28
Motivation (1)• Inherently Distributed Applications
– Distributed systems have come into existence in some very natural ways, e.g., in our society people are distributed and information should also be distributed.
– Applications which require sharing or dissemination of information among distant entities are “natural” distributed systems
– Distributed database system information is generated at different branch offices (sub databases), so that a local access can be done quickly.
– The system also provides a global view to support various globaloperations.
– E.g. ATM, airline reservation systems, remote monitoring, etc.
June 24, 2009 CS-524(NED) Lec 01 29
Motivation (2)
• Improved PCR– The parallelism of distributed systems reduces
processing bottlenecks and provides improved all-around performance, at much lower cost.
• Resource Sharing– Distributed systems can efficiently support information
and resource (hardware and software) sharing for users at different locations.
June 24, 2009 CS-524(NED) Lec 01 30
Motivation (3)
• Fault Tolerance– With the multiplicity of storage units and processing
elements, distributed systems have the potential ability to continue operation in the presence of failures in the system.
• Scalability – Distributed systems are capable of incremental growth and
have the added advantage of facilitating modification or extension of a system to adapt to a changing environment without disrupting its operations.
• Think of upgrading a mainframe or super computer!
June 24, 2009 CS-524(NED) Lec 01 31
Motivation (4)• Distribution as an Artifact
– Distribution may be an artifact of an engineering solution to satisfy some specific requirements such as
• Fault-tolerance• Load-balancing• Minimum level of Quality of Service (QoS)
– E.g. Replicated servers• Functional Distribution
– Computers have different functional capabilities • Client / server• Host / terminal• Data gathering / data processing
June 24, 2009 CS-524(NED) Lec 01 32
Driving Forces
• There are two main stimuli for the current interest in distributed systems:– Technological Enhancement
• microelectronics– fast and inexpensive processors
• communication– highly efficient computer networks
– User Needs• many enterprises are cooperative in nature
June 24, 2009 CS-524(NED) Lec 01 33
Classes of Distributed Systems
• Distributed Computing Systems• Distributed Information Systems• Distributed Pervasive Systems
June 24, 2009 CS-524(NED) Lec 01 34
Distributed Computing Systems
• High-Performance Computing Systems– Cluster computing– Grid computing
June 24, 2009 CS-524(NED) Lec 01 35
Cluster Computing• A collection of similar processors (PCs, workstations)
running the same (commodity) operating system, connected by a high-speed network.
• Runs parallel programs• Popular because they offer parallel computing
capabilities using inexpensive PC hardware; an organization may be able to capitalize on machines it already has.
• Microsoft, Sun, and others sell clustering software and you can also buy turnkey systems
June 24, 2009 CS-524(NED) Lec 01 37
Clusters – Beowulf Model• Linux-based• Structured according to master-slave paradigm
– One processor is the master; allocates tasks to other processors, maintains batch queue of submitted jobs, handles interface to users
– Libraries to handle message-based communication or other features
June 24, 2009 CS-524(NED) Lec 01 38
Clusters – MOSIX Model
• Provides a symmetric, rather than hierarchical paradigm – High degree of distribution transparency– Processes can migrate between nodes
dynamically and preemptively
June 24, 2009 CS-524(NED) Lec 01 39
Grid Computing Systems• Modeled loosely on the electrical grid.• Unlike clusters, computers in grids are highly heterogeneous in their
hardware, software, networks, security policies, etc.• Grids support virtual organizations: a collaboration of users who pool
resources (servers, storage, databases) and share them• Grid software is concerned with managing sharing across
administrative domains– each part potentially under a different administrative domain,
hardware/software/network• Key issue –sharing resources across organizations
– much pain goes into standards and interfaces
June 24, 2009 CS-524(NED) Lec 01 40
Grid Computing SystemsGrid
Middleware
A layered architecture for grid computing systems
June 24, 2009 CS-524(NED) Lec 01 41
A Proposed Software Architecture• Fabric Layer
– interfaces to local resources• Connectivity Layer
– protocols to support usage of multiple resources for a single application; e.g., access a remote resource or transfer data between sites
• Resource Layer– manages a single resource
June 24, 2009 CS-524(NED) Lec 01 42
A Proposed Software Architecture• Collective Layer
– services for resource discovery, resource allocation, resource scheduling, etc.
– Interacts with the connectivity and resource layers
• Application layer– applications within a virtual
organization (V.O.) which share the grid computing resources.
June 24, 2009 CS-524(NED) Lec 01 43
OGSA – A Grid Architecture
• Open Grid Services Architecture– a service-oriented architecture– sites that offer resources to share do so by
offering specific Web services.• The architecture of the OGSA model is more
complex than the previous layered model.
June 24, 2009 CS-524(NED) Lec 01 44
Other Grid Resources
• The Globus Alliance– a community of organizations and individuals developing
fundamental technologies behind the Grid, which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy
• Grid Computing Info Centre– aims to promote the development and advancement of
technologies that provide seamless and scalable access to wide-area distributed resources
June 24, 2009 CS-524(NED) Lec 01 45
Distributed Information Systems
• Business-oriented• Systems to make a number of separate network
applications interoperable and build “enterprise-wide information systems”.
• Two types are discussed here:– Transaction Processing Systems– Enterprise Application Integration
June 24, 2009 CS-524(NED) Lec 01 46
Transaction Processing Systems
• Provide a highly structured client-server approach for database applications
• Transactions obey the ACID properties:– Atomic: all or nothing at all– Consistent: invariants are preserved (if
consistent before, consistent after)– Isolated concurrent transactions don’t
interfere with each other– Durable: committed operations can’t be
undone
June 24, 2009 CS-524(NED) Lec 01 47
Enterprise Application Integration
• Supports a less-structured approach (as compared to transaction-based systems)
• Application components are allowed to communicate directly
• Communication mechanisms to support this include – Remote Procedure Call (RPC) – Remote Method Invocation (RMI)
June 24, 2009 CS-524(NED) Lec 01 48
Enterprise Application Integration
Middleware as a communication facilitator in enterprise application integration
June 24, 2009 CS-524(NED) Lec 01 49
Distributed Pervasive Systems
• The first two types of systems are characterized by their stability: nodes and network connections are more or less fixed
• This type of system is likely to incorporate small, battery-powered, mobile devices– Home systems– Electronic health care systems – patient monitoring– Sensor networks – data collection, surveillance
June 24, 2009 CS-524(NED) Lec 01 50
Electronic Health Care Systems
Monitoring a person in a pervasive electronic health care system, using (a) a local hub or (b) a continuous wireless connection.
June 24, 2009 CS-524(NED) Lec 01 51
Sensor Networks
Organizing a sensor network database, while storing and processing data only at the operator’s site
June 24, 2009 CS-524(NED) Lec 01 52
Sensor Networks
Organizing a sensor network database, while storing and processing data only at the sensors.
June 24, 2009 CS-524(NED) Lec 01 53
Distributed Systems vs. Parallel Systems
• DS often refers to a system that is to be used by multiple (distributed) users.
– e-commerce or business applications.
• generally refers to a cooperative work environment
• Security is much more of a concern– This is not an option, for example, in
the design of a distributed database for e-commerce. By its very nature, this system must be accessible to the real world -- and as a consequence must be designed with security in mind.
• PS often has the connotation of a system that is designed to have only a single user or user process
– scientific applications• typically refers to an environment
designed to provide the maximum parallelization and speed-up for a single task
• If the only goal of a super computer is to rapidly solve a complex task, it can be locked in a secure facility, physically and logically inaccessible --security problem solved.
June 24, 2009 CS-524(NED) Lec 01 54
Distributed System Challenges• Resource Accessibility• Security• Concurrency • Heterogeneity • Transparency• Openness• Scalability• Reliability• Lack of Global Clock and Global State
June 24, 2009 CS-524(NED) Lec 01 55
Resource Accessibility• Support user access to remote resources (printers, data
files, web pages, CPU cycles) and the fair sharing of the resources– making convenient to share resources
June 24, 2009 CS-524(NED) Lec 01 56
Security• Sharing, as always, introduces security issues• Confidentiality
– avoiding the disclosure of the content of a message to a party distinct from the intended receiver
• Integrity– avoiding the corruption of the transmitted contents by a third
party
• Availability– the capability of providing a service in all circumstances
June 24, 2009 CS-524(NED) Lec 01 57
Concurrency
• Resources can be shared by clients in a distributed system, therefore several clients may access a shared resource at the same time
• Not acceptable that each request be processed in turn, must be able to process requests concurrently
• For each object that represents a shared resource, its operations must be synchronized in such a way that its data remains consistent
June 24, 2009 CS-524(NED) Lec 01 58
Heterogeneity - I• Heterogeneity (variety and difference) applies to:
– Networks–differences are masked by the fact that all of the computers use the Internet protocols to communicate.
– Hardware–data types, such as integers, may be represented in different ways on different sorts of hardware (byte ordering: big-endian, little-endian)
– Operating Systems–do not provide the same application API to the Internet protocols.
– Programming languages–use different representations for characters and data structures, such as arrays and records.
– Developers–representation of primitive data items and data structures needs to be agreed upon (standards)
• Middleware– Software layer that abstracts from the above providing a uniform computational model
– All middleware deals with the differences in operating systems and hardware.
June 24, 2009 CS-524(NED) Lec 01 59
Heterogeneity - II
• Mobile Code– A code that can be sent from one computer to another and runs
at the destination (e.g. Java applets).
– Machine code suitable for running on one type of computer hardware is not suitable for running on another.
• Virtual Machines Approach– provides a way of making code executable on any hardware: the
compiler for a particular language generates code for a virtual machine instead of a particular hardware.
June 24, 2009 CS-524(NED) Lec 01 60
TransparencyTransparency
• A distributed system that appears to its users & applications to be a single computer system is said to be transparent.– Users & applications should be able to access
remote resources in the same way they access local resources.
• Aims to conceal the component-based structure of the system, and facilitate a perception of the system as a whole
June 24, 2009 CS-524(NED) Lec 01 61
Transparency Classes (1)• Access Transparency
– Hides differences in data representation, different architectures and file-name conventions of machines
– Enables interoperability
• Location Transparency– Hides location of resource i.e. the user can use the resource without
being aware of its location
– The key is naming
– E.g. URLs, email, etc.
(Access + Location) Transparency = Network Transparency(Access + Location) Transparency = Network Transparency
June 24, 2009 CS-524(NED) Lec 01 62
Transparency Classes (2)• Migration Transparency
– Hides from the user that the resource being used has moved to another location
• Relocation Transparency– Hides from the user that the resource being used is being moved
– Enables mobile computing
• Persistence Transparency– Hides whether a resource is in memory or on disk
June 24, 2009 CS-524(NED) Lec 01 63
Transparency Classes (3)• Replication Transparency
– Hides that multiple copies of the resource exist (for reliability and/or availability)
• Concurrency Transparency– Hides that the resource may be shared concurrently
• Failure Transparency– Hides failure and (possible) recovery of the resource
– Email is eventually delivered, even when servers or communication links fail.
• Scaling Transparency– Allows system and applications to expand without need to change structure or application
algorithms
• Performance Transparency– Adaptation of the system to varying load situations without the user noticing it
June 24, 2009 CS-524(NED) Lec 01 64
Degrees of Transparency• Performance
– e.g. multiple attempts to contact a remote server can slow down the system – should you report failure and let user cancel request?
• Convenience
– e.g. direct the print request to my local printer, not one on the next floor
• Too much emphasis on transparency may prevent the user from understanding system behavior
• Transparency is sometimes against application’s goals – e.g. pervasive computing and location awareness
June 24, 2009 CS-524(NED) Lec 01 65
Openness - I
• Services should follow agreed-upon rules on component syntax & semantics for interoperability and portability
• Using interfaces, any process that needs a service should be able to communicate with a process that provides the service.
• Multiple implementations of the same service may be provided, as long as the interface is maintained
June 24, 2009 CS-524(NED) Lec 01 66
Openness - II• Interoperability
– The ability of two different systems or applications to work together by relying on each other’s services as specified by a common standard
• Portability– The ability of an application designed to run on distributed system A to run on
distributed system B which implements the same interface, without modification
• Extensibility– If a distributed system is open (implements standard interfaces) it should be
possible to add and delete components without affecting the system as a whole.
• e.g., replace the file system
June 24, 2009 CS-524(NED) Lec 01 67
Scalability – I• A system is scalable if it will remain effective if there is a significant
increase in the number of resources and the number of users• The design of scalable distributed system poses the following
challenges– Controlling Cost of Physical Resources
• For a system with n users to be scalable, the quantity of physical resources required to support them should be at most O(n) –that is, proportional to n. E.g., if a single file server can support 20 users, then two such servers should be able to support 40 users.
– Controlling Performance Loss• Maximum performance loss should be no worse than O(log n) where n is size of data.
– Preventing Software Resources Running Out• IP Addresses (initially 32 bits in IPv4). 128-bit in IPv6
June 24, 2009 CS-524(NED) Lec 01 68
Scalability – II
• With respect to size
• With respect to geographical distribution
• With respect to the number of administrative organizations it spans– Most systems account only, to a certain extent, for
size scalability.
– Today, the challenge lies in geographical and administrative scalability.
June 24, 2009 CS-524(NED) Lec 01 69
Size Scalability• The more users and resources a system has, the harder
it is to support a centralized model.• Scalability is affected when the system is based on
– Centralized server• one for all users
– Centralized data• a single database for all users
– Centralized algorithms• e.g. for routing: one site collects all information,
processes it, distributes the results to all sites
June 24, 2009 CS-524(NED) Lec 01 70
Size Scalability
• A single centralized server, running on a single machine, can saturate if the workload becomes too heavy.
– Communication links around the server can limit performance, as well
• Centralized data storage is impractical for large databases
– If the Internet’s Domain Name Service consisted of a single table, it would be virtually impossible to resolve a URL in reasonable time
June 24, 2009 CS-524(NED) Lec 01 71
Size Scalability• Centralized algorithms rely on a central coordinator that
collects data from all sites in the network and then makes decisions.– Complete knowledge
• good
– Time and network traffic• bad
• Wherever possible, distributed algorithms are desirable.
June 24, 2009 CS-524(NED) Lec 01 72
Size Scalability
• Decentralized or Distributed Algorithms– No machine has complete information about the
system state
– Machines make decisions based only on local information
– Failure of a single machine doesn’t ruin the algorithm
– There is no assumption that a global clock exists.
June 24, 2009 CS-524(NED) Lec 01 73
Geographic Scalability• Early distributed systems ran on LANs; relied on
synchronous communication– requesting client blocks until it gets a response,
makes it hard to scale
June 24, 2009 CS-524(NED) Lec 01 74
Administrative Scalability
• Different domains may have different policies about resource usage, management, security, etc.
• Trust often stops at administrative boundaries
June 24, 2009 CS-524(NED) Lec 01 75
Scaling Techniques
• Scalability affects performance more than anything else.
• Three techniques to improve scalability:– Hiding Communication Latencies– Distribution– Replication
June 24, 2009 CS-524(NED) Lec 01 76
Scalability – Amazon.com• Werner Vogels’ talk Order in the Chaos: Building the Amazon.com
Platform
• 1995: Started out with a single web service on a single server
• Today Amazon has about 150 web services on its homepage alone.
• 1 million merchant partners; 60 million customers
• 1999: A misstep during this exponential growth period was movingto mainframe from distributed server. – Failed to meet scalability, reliability and performance; it was scratched
in 2000.
June 24, 2009 CS-524(NED) Lec 01 77
Hiding Communication Delays• Key for geographic scalability • Structure applications to use asynchronous communication (no
blocking for replies)– While waiting for one answer, do something else; create one
thread to wait for the reply and let other threads continue to process or schedule another task
• Download part of the computation to the requesting platform to speed up processing– E.g. Filling in forms to access a DB:
• send a separate message for each field• download form/code and submit finished version. JavaScript and
Java applets support this approach.
June 24, 2009 CS-524(NED) Lec 01 79
Distribution
• Instead of one centralized service, divide into parts and distribute them geographically
– Example: DNS namespace is organized as a tree of domains; each domain is divided into zones; names in each zone are handled by a different name server
June 24, 2009 CS-524(NED) Lec 01 80
Distribution
An example of dividing the DNS name space into zones
June 24, 2009 CS-524(NED) Lec 01 81
Replication
• Replication: multiple identical copies of something
• Replication– Increases availability– Improves performance through load balancing– May avoid latency by improving proximity of
resource
June 24, 2009 CS-524(NED) Lec 01 82
Replication - Caching
• Caching is a form of replication– Normally creates a (temporary) replica of
something closer to the user• User decides to cache, system decides to
replicate• Replication is more permanent• Both lead to consistency problems
June 24, 2009 CS-524(NED) Lec 01 83
Replication - Caching
• Having multiple copies (cached or replicated), leads to inconsistencies: – modifying one copy makes that copy different from the rest.
• Always keeping copies consistent and in a general way requires global synchronization on each modification.– Global synchronization precludes large-scale solutions.
• If we can tolerate inconsistencies, we may reduce the need for global synchronization.
• Tolerating inconsistencies is application dependent.
June 24, 2009 CS-524(NED) Lec 01 84
Reliability – Failure Handling• Techniques
– Failure Detection• message checksum
– Failure Masking• making a detected failure hidden or less severe• email retransmission
– Tolerating Failures• Web pages (informing users about failure)
– Failure Recovery• permanent data rolled back
– Redundancy (use of redundant components)• Duplication in routes, hardware,• DNS –every name table replicated in at least two different servers• Databases –replicated in several serversseveral servers
June 24, 2009 CS-524(NED) Lec 01 85
Reliability – Failure Handling
• Availability– Measure of the proportion of time a system is
available for use. – DS provide a high degree of availability
regarding hardware faults.
June 24, 2009 CS-524(NED) Lec 01 86
Lack of Global Clock & State
• There are limits on the precision with which processes in a distributed system can synchronize their clocks
• There is no single process in the distributed system that would have a knowledge of the current global state of the system
June 24, 2009 CS-524(NED) Lec 01 87
Fallacies of Distributed Computing
• Source: Peter Deutsch (The following false assumptions add to the challenges)– The network is reliable– Latency is zero– Bandwidth is infinite– The network is secure– Topology doesn’t change– There is one administrator– Transport cost is zero– The network is homogeneous
June 24, 2009 CS-524(NED) Lec 01 88
Summary • Distributed computing brings transparent access to as much computer
power and data as the user needs to accomplish any given task, and at the
same time, achieves high performance and reliability objectives
• Despite the failure, uncertainty, and lack of specialized hardware support,
we can build and effectively use systems that are an order of magnitude
more powerful. In fact we can do this while providing a more available, more
robust, more convenient solution.
• Middleware is a key facility for building distributed systems
• Its difficult to design a good distributed system: there are a lot of problems
in getting “good” characteristics, not the least of which is people.