Challenge Issues in Distributed Systems
ECE7650
Challenge Issues 1-1
Glory of the Internet 1960s: queuing theory and packet
switching principles (ARPNET) 1970s: Proprietary networks (Ethernet) and
Inter-networking (TCP/IP) 1980s: Network protocols (smtp, ftp, etc),
new networks like NSFnet and Bitnet 1990s: Killer applications (web and e-
commerce), commercialization 2000s: Applications blooming (p2p,
VoiceIP, social networks, cloud and storage services,etc)
Reasons for Internet Success Cerf and Kahn’s internetworking principles
(1974) minimalism, autonomy - no internal changes
required to interconnect networks best effort service model stateless routers decentralized control
Design philosophy is to make it simple A key architectural feature is “narrow-waisted
hourglass model” with a well-defined small interface at the mid-level
Assumptions Stationary hosts in wired network
Each host is assigned a topologically-dependent IP address
Routing is based on IP address But mobile and wireless comm becomes pervasive
Friendly environment Hosts trust each other, little concern of security and
privacy TCP/IP is non-secure by design “Identity assumption” is no longer valid
Accountability problem Small scale and uniform edge
Grew out of early small scale ARPNET experience No one could image today’s hundred of mills hosts,
billions’ cellular phones are ready to be plugged in; sensor networks, etc
Assumptions (cont’) Simple applications
Alternative reliable communication infrastructure based on Cerf and Khan’s principles and “narrow-waisted hourglass model”
Clearly defined applications are supported by a well-defined functional interface
Good will and cooperative Best effort, store & forward, autonomous,
distributed decisions in intra-domain, as well as inter-domain (BGP)
Reality is a battlefield of multi-players; competition, economic incentives must be taken into account
Ad Hoc Work-Around To accommodate mobile hosts
Mobile IP, but IP addr corresponding host and triangle routing is
inefficient TCP hides high delay and loss rate in wireless networks by
dealing with them as congestion Hostile environment
Firewall, but Violate end-to-end argument; possibility of firewall must
be taken into account by appl designers IPsec? How can you prevent from attacks/harassment by
unsolicited traffic! Large scale, diversified edge
NAT relieves the shortage of address; Routers should process up to layer 3, but NAT router needs to process layer 4
Ad Hoc Work-Around (cont’) Meet various application requirements
QoS-aware routers: IntServ, DiffServ RSVP, etc Hard to deploy widely (all routers along the path)
Non-cooperative, competitive Service Level Agreement (SLA) enforcement? Big BGP problems in inter-domain routing; a single
mistyped command at a router at one ISP caused disruption of connectivity across many neighbors
Economic incentive? Hard to reach consensus between competitors;
sometimes standardization may lose the market advantage
Application-Level MitigationTCP/UDP/IP: “best-effort service” no guarantees on delay, loss
Today’s Internet multimedia applications use application-level techniques to mitigate
(as best possible) effects of delay, loss
But multimedia apps requiresQoS and level of performance to be effective!
?? ???
?
? ??
?
?
Any problem in computer science can be solved with another layer of indirection [except the problem of too many layers of indirection] (David Wheeler, PhD’51)
Distributed systems layer
Challenge Issues 1-9
application
transport
network
link
physical
NIC/Driver
OS/Net module
Distributed AppsMiddleware Service:
Communication:Sync vs Async commGroup commReliable commTransactional commLatency toleranceEtcCoordination
1-10
What is a Distributed System A system in which hw or sw components located at
networked computers communicate and coordinate their actions only by passing messages. [CDK] Autonomous: independent failures Concurrent program execution is norm No global clok: coordination by exchanging messages
Examples Basic Internet services like Web, email, ftp, Streaming apps (audio, video) P2P file sharing (bitorrent) Cloud computing and storage services
Challenge Issues
1-11
Middleware Computer sw that connects sw components or
some people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact. The set of services together defines a uniform
computing model for use by the programmers of servers and distributed apps
Challenge Issues
1-12
Challenge Issues Heterogeneity
Heterogeneous components must be able to interoperate Distribution transparency
Distribution should be hidden from the user as much as possible
Fault tolerance Failure of a component (partial failure) should not result
in failure of the whole system Scalability
System should work efficiently with an increasing number of users
System performance should increase with inclusion of additional resources.
Challenge Issues
1-13
Challenge Issues (cont’) Concurrency
Shared access to resources must be possible
Openness Interfaces should be publicly available to
ease adding new components Security
The system should only be used in the way intended
Challenge Issues
1-14
Heterogeneity Variety of computers in a DS
Networks, computer HW, OS, Programming languages, various implementations, etc
E.g. network protocols, data types, Middleware is a software layer providing
a programming abstraction as well as masking the heterogeneity. E.g. CORBA, Java RMI are example
Virtual machine approach provides a way of making code executable on any hw. E.g JVM
Challenge Issues
1-15
Openness Characteristic that determines whether the
system can be extended or re-implemented in various ways without disruption to or duplication of existing services. HW extension: peripheral, memory, network
interface SW extension: OS features, communication
protocols, resource sharing services• e.g. Unix utility, browser protocol and handler
Key interfaces are published, or standardized (ISO, IEEE, etc); industry de-facto standards that bypass cumbersome official standardization procedures
Any component implementations must conform to the published standard. Challenge Issues
1-16
Openness: Unix Openness is achieved by specifying and
documenting the key sw interfaces Unix features are fully accessible
through system calls add drivers develop applications include new features: IPC Linux: the kernel is open too!
Challenge Issues
1-17
Openness: Web Browser Openness is achieved through a set of
helpers or content handlers (pluggins) Different data formats are decoded
using different tools E.g. .html/.gif/.jpeg/.pdf
Built-in content handler: extensible? Built-in protocol handler: extensible?
protocol is a set of communication rules
Challenge Issues
1-18
Transparency Concealment from the user and the apps
programmer of the separation of components, so that the system is perceived as a coherent system
Eight Forms of transparency (ANSA’89, ISO’92) Access transparency: enable local and remote resources
to be accessed using identical operations Location transparency: enable resources to be accessed
without knowledge of their location Concurrency transparency: enable several processes to
operate concurrently using shared resources without interference between them
Replication transparency: enable multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or appl. programmers
Challenge Issues
1-19
Transparency (Cont’) Eight Forms of Transparency (cont’)
Failure transparency: enable the concealment of faults, allowing users and appl. Programs to complete their tasks despite the failure of hw or sw components (e.g. email delivery)
• Middleware generally converts the failures of networks and processes into programming-level exception
Mobility transparency: allow the movement of resources and clients within a system without affecting the operation of users or programs
Performance transparency: allow the system to be reconfigured to improve performance as loads vary
Scaling transparency: allow the system and application to expand in scale without change to the system structure or the application algorithms.
Challenge Issues
1-20
TransparencyAccess transparencyLocation transparencyMobility transparencyFailure transparencyReplication transparencyConcurrency transparencyPerformance transparencyScaling transparency
Network Transparency
Different forms of transparency in a distributed system; Full transparency is too costly and impossible in some situations
Challenge Issues
1-21
Scalability: High Perf./Availability Distributed systems operate effectively and
efficiently at different scales of resources and users Size, Geographical location, Administration
Objectives: Control the cost of physical resource. E.g. if a single file
server can support 20 users, 40 users for two servers? Control the performance loss, independent of resource
size? Prevent sw resources running out.
• E.g. 32-bit Internet address IPv4 and 128-bit Internet address IPV6.
• Cost of scalability can’t be ignored: overhead of a scalable machine: Power, Fan, ...
• Over-compensating for future growth may be worse than adapting to a change when we are forced to
Challenge Issues
1-22
Scalability (Cont’) Objectives (cont’)
Avoid performance bottleneck• Centralized vs decentralized organization
Concept Example
Centralized services A single server for all usersCentralized data A single on-line telephone book
Centralized algorithms Doing routing based on complete information
Challenge Issues
1-23
Scaling Techniques Hide communication latency
Asynchronous communication Distribution
Naming Replication
Cache Consistency
Challenge Issues
1-24
Scaling Tech. for Interactive App
1.4
The difference between letting:a) a server orb) a client check forms as they are being filled
Challenge Issues
1-25
Scalable Naming
1.5
An example of dividing the DNS name space into zones.
Challenge Issues
1-26
Concurrency: High Perf./Availability:
More than one client want to access shared resource at the same time; the requests need be handled in parallel
Server-side concurrency Server side operations: Database/mining,
CGI Servers on single CPU machines
(Interleaving):• multiprogramming
Servers on symmetric multiple CPU machines• multiprogramming and multithreading
Servers on networks of workstations• Scalable server technology Challenge Issues
1-27
Concurrency (cont.) Clients share load with server
Data compression/decompression Data encryption/decryption input verification, decoration, calculation
• Java applet or JavaScript• Client-side version of JavaScript allows
“executable content” to be included in web pages.
Do it in parallel!
Challenge Issues
1-28
Failure Handling for High Availability HW/SW failure is common. Challenge is
how to deal with failures. Failures in a distributed system are
often partial. failure handling becomes even harder.
Service availability: server’s availability to provide uninterrupted services over the time; measured as the percentage of uptime 99.9% availability equals to 8 hours 45
minutes of downtime per yearChallenge Issues
1-29
Failure Handling How to handle failures:
Failure detection: • Checksun is used to detect corrupted data in a
message• How to detect a remote crashed server
Failure masking. E.g. Retransmit messages that are lost
Recovery from failure:• SW is designed in a way that the state of
permanent data can be recovered or “rolled back” after a server has crashed.
Tolerate failure, by the use of redundant components
Challenge Issues
1-30
Security Security is a primary concern in an open
distributed system Secure system in three aspects:
Confidentiality (privacy): protection against disclosure to unauthorized individuals
Integrity: protect against alteration or corruption
Availability: protect against interference with the means to access the resources
Challenge Issues
Challenge Issues: In Summary
Heterogeneity Distribution transparency Fault tolerance Scalability Concurrency Openness Security
Challenge Issues 1-31