31
Challenge Issues in Distributed Systems ECE7650 Challenge Issues 1-1

Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Embed Size (px)

Citation preview

Page 1: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Challenge Issues in Distributed Systems

ECE7650

Challenge Issues 1-1

Page 2: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Glory of the Internet 1960s: queuing theory and packet

switching principles (ARPNET) 1970s: Proprietary networks (Ethernet) and

Inter-networking (TCP/IP) 1980s: Network protocols (smtp, ftp, etc),

new networks like NSFnet and Bitnet 1990s: Killer applications (web and e-

commerce), commercialization 2000s: Applications blooming (p2p,

VoiceIP, social networks, cloud and storage services,etc)

Page 3: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Reasons for Internet Success

Cerf and Kahn’s internetworking principles (1974) minimalism, autonomy - no internal changes

required to interconnect networks best effort service model stateless routers decentralized control

Design philosophy is to make it simple A key architectural feature is “narrow-waisted

hourglass model” with a well-defined small interface at the mid-level

Page 4: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Assumptions Stationary hosts in wired network

Each host is assigned a topologically-dependent IP address

Routing is based on IP address But mobile and wireless comm becomes pervasive

Friendly environment Hosts trust each other, little concern of security and

privacy TCP/IP is non-secure by design “Identity assumption” is no longer valid

Accountability problem Small scale and uniform edge

Grew out of early small scale ARPNET experience No one could image today’s hundred of mills hosts,

billions’ cellular phones are ready to be plugged in; sensor networks, etc

Page 5: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Assumptions (cont’)

Simple applications Alternative reliable communication

infrastructure based on Cerf and Khan’s principles and “narrow-waisted hourglass model”

Clearly defined applications are supported by a well-defined functional interface

Good will and cooperative Best effort, store & forward, autonomous,

distributed decisions in intra-domain, as well as inter-domain (BGP)

Reality is a battlefield of multi-players; competition, economic incentives must be taken into account

Page 6: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Ad Hoc Work-Around To accommodate mobile hosts

Mobile IP, but IP addr corresponding host and triangle routing is

inefficient TCP hides high delay and loss rate in wireless networks by

dealing with them as congestion Hostile environment

Firewall, but Violate end-to-end argument; possibility of firewall must

be taken into account by appl designers IPsec? How can you prevent from attacks/harassment by

unsolicited traffic! Large scale, diversified edge

NAT relieves the shortage of address; Routers should process up to layer 3, but NAT router needs to process layer 4

Page 7: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Ad Hoc Work-Around (cont’)

Meet various application requirements QoS-aware routers: IntServ, DiffServ RSVP, etc Hard to deploy widely (all routers along the path)

Non-cooperative, competitive Service Level Agreement (SLA) enforcement? Big BGP problems in inter-domain routing; a single

mistyped command at a router at one ISP caused disruption of connectivity across many neighbors

Economic incentive? Hard to reach consensus between competitors;

sometimes standardization may lose the market advantage

Page 8: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Application-Level Mitigation

TCP/UDP/IP: “best-effort service” no guarantees on delay, loss

Today’s Internet multimedia applications use application-level techniques to mitigate

(as best possible) effects of delay, loss

But multimedia apps requiresQoS and level of performance to be effective!

?? ???

?

? ??

?

?

Any problem in computer science can be solved with another layer of indirection [except the problem of too many layers of indirection] (David Wheeler, PhD’51)

Page 9: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Distributed systems layer

Challenge Issues 1-9

application

transport

network

link

physical

NIC/Driver

OS/Net module

Distributed AppsMiddleware Service:

Communication:Sync vs Async commGroup commReliable commTransactional commLatency toleranceEtcCoordination

Page 10: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-10

What is a Distributed System A system in which hw or sw components located at

networked computers communicate and coordinate their actions only by passing messages. [CDK] Autonomous: independent failures Concurrent program execution is norm No global clok: coordination by exchanging messages

Examples Basic Internet services like Web, email, ftp, Streaming apps (audio, video) P2P file sharing (bitorrent) Cloud computing and storage services

Challenge Issues

Page 11: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-11

Middleware Computer sw that connects sw components or

some people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact. The set of services together defines a uniform

computing model for use by the programmers of servers and distributed apps

Challenge Issues

Page 12: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-12

Challenge Issues

Heterogeneity Heterogeneous components must be able to interoperate

Distribution transparency Distribution should be hidden from the user as much as

possible Fault tolerance

Failure of a component (partial failure) should not result in failure of the whole system

Scalability System should work efficiently with an increasing

number of users System performance should increase with inclusion of

additional resources.

Challenge Issues

Page 13: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-13

Challenge Issues (cont’)

Concurrency Shared access to resources must be

possible Openness

Interfaces should be publicly available to ease adding new components

Security The system should only be used in the way

intended

Challenge Issues

Page 14: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-14

Heterogeneity

Variety of computers in a DS Networks, computer HW, OS, Programming

languages, various implementations, etc E.g. network protocols, data types,

Middleware is a software layer providing a programming abstraction as well as masking the heterogeneity. E.g. CORBA, Java RMI are example

Virtual machine approach provides a way of making code executable on any hw. E.g JVM

Challenge Issues

Page 15: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-15

Openness Characteristic that determines whether the

system can be extended or re-implemented in various ways without disruption to or duplication of existing services. HW extension: peripheral, memory, network

interface SW extension: OS features, communication

protocols, resource sharing services• e.g. Unix utility, browser protocol and handler

Key interfaces are published, or standardized (ISO, IEEE, etc); industry de-facto standards that bypass cumbersome official standardization procedures

Any component implementations must conform to the published standard. Challenge Issues

Page 16: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-16

Openness: Unix Openness is achieved by specifying and

documenting the key sw interfaces Unix features are fully accessible

through system calls add drivers develop applications include new features: IPC Linux: the kernel is open too!

Challenge Issues

Page 17: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-17

Openness: Web Browser

Openness is achieved through a set of helpers or content handlers (pluggins)

Different data formats are decoded using different tools E.g. .html/.gif/.jpeg/.pdf

Built-in content handler: extensible? Built-in protocol handler: extensible?

protocol is a set of communication rules

Challenge Issues

Page 18: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-18

Transparency Concealment from the user and the apps

programmer of the separation of components, so that the system is perceived as a coherent system

Eight Forms of transparency (ANSA’89, ISO’92) Access transparency: enable local and remote resources

to be accessed using identical operations Location transparency: enable resources to be accessed

without knowledge of their location Concurrency transparency: enable several processes to

operate concurrently using shared resources without interference between them

Replication transparency: enable multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or appl. programmers

Challenge Issues

Page 19: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-19

Transparency (Cont’) Eight Forms of Transparency (cont’)

Failure transparency: enable the concealment of faults, allowing users and appl. Programs to complete their tasks despite the failure of hw or sw components (e.g. email delivery)

• Middleware generally converts the failures of networks and processes into programming-level exception

Mobility transparency: allow the movement of resources and clients within a system without affecting the operation of users or programs

Performance transparency: allow the system to be reconfigured to improve performance as loads vary

Scaling transparency: allow the system and application to expand in scale without change to the system structure or the application algorithms.

Challenge Issues

Page 20: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-20

Transparency

Access transparencyLocation transparencyMobility transparencyFailure transparencyReplication transparencyConcurrency transparencyPerformance transparencyScaling transparency

Network Transparency

Different forms of transparency in a distributed system; Full transparency is too costly and impossible in some situations

Challenge Issues

Page 21: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-21

Scalability: High Perf./Availability Distributed systems operate effectively and

efficiently at different scales of resources and users Size, Geographical location, Administration

Objectives: Control the cost of physical resource. E.g. if a single file

server can support 20 users, 40 users for two servers? Control the performance loss, independent of resource

size? Prevent sw resources running out.

• E.g. 32-bit Internet address IPv4 and 128-bit Internet address IPV6.

• Cost of scalability can’t be ignored: overhead of a scalable machine: Power, Fan, ...

• Over-compensating for future growth may be worse than adapting to a change when we are forced to

Challenge Issues

Page 22: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-22

Scalability (Cont’) Objectives (cont’)

Avoid performance bottleneck• Centralized vs decentralized organization

Concept Example

Centralized services A single server for all users

Centralized data A single on-line telephone book

Centralized algorithms Doing routing based on complete information

Challenge Issues

Page 23: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-23

Scaling Techniques

Hide communication latency Asynchronous communication

Distribution Naming

Replication Cache Consistency

Challenge Issues

Page 24: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-24

Scaling Tech. for Interactive App

1.4

The difference between letting:a) a server orb) a client check forms as they are being filled

Challenge Issues

Page 25: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-25

Scalable Naming

1.5

An example of dividing the DNS name space into zones.

Challenge Issues

Page 26: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-26

Concurrency: High Perf./Availability:

More than one client want to access shared resource at the same time; the requests need be handled in parallel

Server-side concurrency Server side operations: Database/mining,

CGI Servers on single CPU machines

(Interleaving):• multiprogramming

Servers on symmetric multiple CPU machines• multiprogramming and multithreading

Servers on networks of workstations• Scalable server technology Challenge Issues

Page 27: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-27

Concurrency (cont.)

Clients share load with server Data compression/decompression Data encryption/decryption input verification, decoration, calculation

• Java applet or JavaScript• Client-side version of JavaScript allows

“executable content” to be included in web pages.

Do it in parallel!

Challenge Issues

Page 28: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-28

Failure Handling for High Availability

HW/SW failure is common. Challenge is how to deal with failures.

Failures in a distributed system are often partial. failure handling becomes even harder.

Service availability: server’s availability to provide uninterrupted services over the time; measured as the percentage of uptime 99.9% availability equals to 8 hours 45

minutes of downtime per year

Challenge Issues

Page 29: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-29

Failure Handling How to handle failures:

Failure detection: • Checksun is used to detect corrupted data in a

message• How to detect a remote crashed server

Failure masking. E.g. Retransmit messages that are lost

Recovery from failure:• SW is designed in a way that the state of

permanent data can be recovered or “rolled back” after a server has crashed.

Tolerate failure, by the use of redundant components

Challenge Issues

Page 30: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

1-30

Security

Security is a primary concern in an open distributed system

Secure system in three aspects: Confidentiality (privacy): protection against

disclosure to unauthorized individuals Integrity: protect against alteration or

corruption Availability: protect against interference

with the means to access the resources

Challenge Issues

Page 31: Challenge Issues in Distributed Systems ECE7650 Challenge Issues1-1

Challenge Issues: In Summary

Heterogeneity Distribution transparency Fault tolerance Scalability Concurrency Openness Security

Challenge Issues 1-31