413
Roland Wism¨ uller Betriebssysteme / verteilte Systeme Distributed Systems (1/13) i Roland Wism ¨ uller Universit ¨ at Siegen rolanda .d wismuellera @d uni-siegena .d de Tel.: 0271/740-4050, B¨ uro: H-B 8404 Stand: March 29, 2021 Distributed Systems Summer Term 2021

Distributed Systems - Uni Siegen

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) i

Roland Wismuller

Universitat Siegen

[email protected]

Tel.: 0271/740-4050, Buro: H-B 8404

Stand: March 29, 2021

Distributed Systems

Summer Term 2021

Page 2: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 2

Distributed SystemsSummer Term 2021

0 Organisation

Page 3: Distributed Systems - Uni Siegen

About Myself

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 3

➥ Studies in Computer Science, Techn. Univ. Munich

➥ Ph.D. in 1994, state doctorate in 2001

➥ Since 2004 Prof. for Operating Systems and Distributed Systems

➥ Research: Monitoring, Analysis und Control of parallel and

distributed Systems

➥ Mentor for Bachelor Studies in Computer Science with secondary

field Mathematics

➥ E-mail: [email protected]

➥ Tel.: 0271/740-4050

➥ Room: H-B 8404

Page 4: Distributed Systems - Uni Siegen

About the Chair ”‘Operating Systems / Distrib. Sys.”’

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 4

Andreas Hoffmannandreas.hoffmann@uni-...

0271/740-4047

H-B 8405

➥ E-assessment and e-labs

➥ IT security

➥ Web technologies

➥ Mobile applications

Damian Ludwigdamian.ludwig@uni-...

0271/740-2533

H-B 8402

➥ Capability systems

➥ Compilers

➥ Programming languages

Hawzhin Hozhabr Pourhawzhin.hozhabrpour@uni-...

0271/740-4038

H-B 8411

➥ Machine Learning

➥ Pattern recognition in car sensordata

➥ Anomaly detection

Page 5: Distributed Systems - Uni Siegen

Teaching

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 5

Lectures/Labs

➥ Rechnernetze I, 5 LP (every summer term)

➥ Rechnernetze Praktikum, 5 LP (every winter term)

➥ Rechnernetze II, 5 LP (every summer term)

➥ Betriebssysteme I, 5 LP (every winter term)

➥ Parallel Processing, 5 LP (every winter term)

➥ Distributed Systems, 5 LP (every summer term)

Page 6: Distributed Systems - Uni Siegen

Teaching ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 6

Project Groups

➥ e.g., recording and analyzing car sensor data

➥ e.g., outlier detection in car sensor data

Theses (Bachelor, Master)

➥ Topic areas: secure virtual machine, parallel computing, patternrecognition in sensor data, e-assessment, ...

Seminars

➥ Topic areas: IT security, programming languages, patternrecognition in sensor data, ...

➥ Procedure: block seminar

➥ 30 min. talk, 5000 word seminar paper

Page 7: Distributed Systems - Uni Siegen

About the Lecture

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 7

➥ Lecture:

➥ digital: screen casts at moodle

➥ Q&A: Mon., 12:00 - 12:30 (or longer, if needed) via zoom

➥ Exercises:

➥ 2 hours (digital)

➥ Tue., 10:15-11:45, via zoom, starting 20.04.

➥ this zoom meeting will be recorded!

➥ includes programming exercises using Java

➥ Links to zoom meetings: see moodle

Page 8: Distributed Systems - Uni Siegen

About the Lecture ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 8

Information, Slides and Announcements

➥ http://www.bs.informatik.uni-siegen.de/lehre/vs

➥ For printing: use print service of the Student Council!

➥ If necessary, updates/supplements shortly before the lecture

➥ look at the date!

➥ Exercise sheets will be put online as PDF

➥ please print and process them yourself!

Page 9: Distributed Systems - Uni Siegen

Examination

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 9

➥ Oral examination

➥ duration about 30 minutes

➥ Registration:

➥ first register at the campus management system (unisono)

➥ at least 1 week before the exam date

➥ then fix a date with my secretary

➥ at least 1 week before the exam date

➥ Mrs. Syska, regina.syska@uni-...

➥ cancellation is possible up to 7 days before the exam

➥ via unisono➥ please inform me, too!

Page 10: Distributed Systems - Uni Siegen

Contents of the Lecture

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 10

➥ Introduction

➥ Middleware

➥ Distributed programming with Java RMI

➥ Name services

➥ Process management

➥ Time and global state

➥ Coordination

➥ Replication and consistency

➥ Distributed file systems

➥ Fault tolerance

Page 11: Distributed Systems - Uni Siegen

Learning targets

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 11

➥ Understand the properties of distributed systems

➥ absence of a global state

➥ problems with synchronization and with consistency of

replicated data

➥ Understand the approaches to solve the problems

and be able to apply them to given challenges

➥ Distinguish architecture models for distributed systems as well as

different types and tasks of middleware

and be able to assess their usability for given problems

➥ Be able to develop simple distributed programs with Java RMI

Page 12: Distributed Systems - Uni Siegen

Literature

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 12

➥ Andrew S. Tanenbaum, Marten van Steen. Verteilte Systeme,Grundlagen und Paradigmen. Pearson Studium, 2003.(English: Distributed Systems: Principles and Paradigms, 2ndEdition. Pearson Education, 2016. Available online.)

➥ Ulrike Hammerschall. Verteilte Systeme und Anwendungen. Pear-son Studium, 2005.

➥ George Coulouris, Jean Dollimore, Tim Kindberg. Verteilte Sys-teme, Konzepte und Design, 3. Auflage. Pearson Studium, 2002.(English: Distributed Systems: Concepts and Design, 5th Edition.Pearson Education, 2012.)

➥ Andrew S. Tanenbaum. Moderne Betriebssysteme, 2. Auflage.Pearson Studium, 2003.

➥ William Stallings. Betriebssysteme – Prinzipien und Umsetzung,4. Auflage. Pearson Studium, 2003.

Page 13: Distributed Systems - Uni Siegen

Literature ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 13

➥ Jim Farley, William Crawford, David Flanagan. Java Enterprise in

a Nutshell. O’Reilly 2002.

➥ Cay S. Horstmann, Gary Cornell. Core Java 2, Band 2 –

Expertenwissen. Sun Microsystems Press / Addison Wesley,

2008.

➥ Robert Orfali, Dan Harkey. Client/Server-Programming with Java

and Corba. John Wiley & Sons, 1998.

➥ Torsten Langner. Verteilte Anwendungen mit Java. Markt +

Technik, 2002.

Page 14: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 14

Distributed SystemsSummer Term 2021

1 Introduction

Page 15: Distributed Systems - Uni Siegen

1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 15

Contents

➥ What makes a distributed system?

➥ Software architecture

➥ Architecture models

➥ Cluster

Literature

➥ Hammerschall: 1

➥ Tanenbaum, van Steen: 1

➥ Colouris, Dollimore, Kindberg: 1, 2

➥ Stallings: 13.4

Page 16: Distributed Systems - Uni Siegen

1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 16

1.1 What makes a distributed system?

work together to coordinate their actions by exchanging messages.In a distributed system, components located on different computers

G. Coulouris

A distributed system is one on which I can’t do any work because somemachine I’ve never heard of has crashed. L. Lamport

A distributed system is a set of independent computers that appear tothe user as a single, coherent system.

A. Tanenbaum

A distributed system is a collection of processors that neither sharemain memory nor a clock. A. Silberschatz

Page 17: Distributed Systems - Uni Siegen

1.1 What makes a distributed system? ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 17

➥ A distributed system is a system

➥ in which hardware and software components are based on

networked computers, and

➥ communicate and coordinate their actions only via the

exchange of messages.

➥ The boundaries of the distributed system are defined by a com-

mon application

➥ Best known example: Internet

➥ communication via the standardized Internet protocols

➥ IP and TCP / UDP (☞ lecture Computer Networks)

➥ users can use services / applications, regardless of the present

location

Page 18: Distributed Systems - Uni Siegen

1.1 What makes a distributed system? ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 18

What is a distributed application?

➥ Application that uses a distributed system to create a

self-contained functionality

➥ Application logic distributed among several, largely independent

components

➥ Components often executed on different machines

➥ Examples:

➥ simple internet applications (e.g. WWW, FTP, email)

➥ distributed information systems (e.g. flight booking)

➥ SW intensive, data centered, interactive, highly concurrent

➥ distributed embedded systems (e.g. in the car)

➥ distributed mobile applications (e.g. for handhelds)

Page 19: Distributed Systems - Uni Siegen

1.1 What makes a distributed system? ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 19

A typical distributed system

Desktop

Desktop

WWWserver DataPrint

server

LANInternet

LAN

serverMail

Appli−cationserver

serverbase

Page 20: Distributed Systems - Uni Siegen

1.1 What makes a distributed system? ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 20

Why distribution?

➥ Central, non-distributed applications are

➥ generally safer and more reliable

➥ generally more performant

➥ Main reason for distribution: sharing of resources

➥ Hardware resources (printer, scanner, ...)

➥ cost saving

➥ Data and information (file server, database, ...)

➥ information exchange, data consistency

➥ Functionality (centralization)

➥ error avoidance, reuse

Page 21: Distributed Systems - Uni Siegen

1.2 Characteristics of distributed systems

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 21

➥ Resources (e.g. computers, data, users, ...) are distributed

➥ sometimes worldwide

➥ Cooperation via message exchange

➥ Concurrency

➥ but: parallel processing of a single request is not the primary

goal

➥ No global clock (more precisely: no global time)

➥ Distributed status information

➥ no uniquely determined global state

➥ Partial errors are possible (independent failures)

Page 22: Distributed Systems - Uni Siegen

1.2 Characteristics of distributed systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 22

Parallel vs. distributed systems

➥ Parallel system:

➥ motivation: higher performance through parallel execution

➥ multiple tasks (processes/threads) working on one job

➥ tasks are fine-grained: frequent communication

➥ tasks work simultaneously (parallel)

➥ homogeneous hardware / OSs, regular network structure

➥ Distributed system:

➥ motivation: distributed resources (computers, data, users)

➥ multiple tasks (processes/threads) working on one or manyjobs

➥ tasks are coarse grained: communication less frequent

➥ tasks work synchronized (usually one after the other)

➥ inhomogeneous (processors, networks, OSs, ...)

Page 23: Distributed Systems - Uni Siegen

1.3 Challenges and Goals of Distributed Systems

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 23

➥ Heterogeneity: computer hardware, networks, OSs,programming languages, implementations by differentdevelopers, ...

➥ solution: middleware➥ software layer that hides heterogeneity by providing a

unified programming model

➥ e.g. CORBA: distributed objects, remote method invocation

➥ e.g. web services: remote procedure calls (services)

➥ Openness: easy extensibility (with new services)

➥ requirements:

➥ key interfaces are published / standardized

➥ uniform communication mechanisms / protocols

➥ components must conform to standards

[Coulouris, 1.4]

Page 24: Distributed Systems - Uni Siegen

1.3 Challenges and Goals of Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 24

➥ Security

➥ information: confidentiality, integrity, availability

➥ esp. with mobile code

➥ users: authentication, authorization

➥ Scalability: number of resources or users can grow without

negative impact on performance and cost

➥ Error handling (partial errors)

➥ error detection (e.g. checksums)

➥ error masking (e.g. retransmission of a message)

➥ error tolerance (e.g. browser: “server not available”)

➥ recovery (of data) after errors

➥ redundancy (of hardware and software)

Page 25: Distributed Systems - Uni Siegen

1.3 Challenges and Goals of Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 25

➥ Concurrency

➥ synchronization, consistency of replicated data

➥ Transparency

➥ access∼: local and remote accesses identical}

network∼➥ location∼: no need to know the location

➥ mobility∼: transparent relocation of resources

➥ replication∼: transparent replication of resources

➥ concurrency∼: shared use of resources without disruptions

➥ error∼: hiding errors due to component failure

➥ performance∼: performance is largely independent of theload

➥ scaling∼: system scales without negative impact on users

Page 26: Distributed Systems - Uni Siegen

1.4 Software Architecture

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 26

Types of Operating Systems for Distributed Systems

➥ Network operating system:

➥ traditional OS, extended by support for network applications

(API for sockets, RPC, ...)

➥ each computer has its own OS, but can use services of other

computers (file system, email, ssh, ...)

➥ the existence of the other computers is visible

➥ Distributed operating system:

➥ uniform OS for a network of computers

➥ transparent for the user

➥ requires cooperation of the OS kernels

➥ so far mainly research projects

Page 27: Distributed Systems - Uni Siegen

1.4 Software Architecture ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 27

Typical layers in a distributed system

Applications

Pla

tfo

rm(s

)

Services (generic or application specific)

Computer and network hardware

Middleware

(Network) Operating system

API

[Coulouris, 2.2.1]

Page 28: Distributed Systems - Uni Siegen

1.4 Software Architecture ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 28

Middleware

➥ Tasks:

➥ hiding of distribution and heterogeneity

➥ providing a common programming model / API

➥ provision of general services

➥ Functions e.g:

➥ communication services: remote method calls, group

communication, event notifications

➥ replication of shared data

➥ security services

➥ Examples: CORBA, EJB, .NET, Axis2 (Web Services), ...

(☞ Lecture Client/Server Programming)

Page 29: Distributed Systems - Uni Siegen

1.5 Architectural Models

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 29

➥ An architecture model characterizes:

➥ roles of an application component within the distributedapplication

➥ relationships between application components

➥ Role defined by the type of process the component is running in:

➥ client process

➥ short-lived (for the duration of use by the user)

➥ acts as initiator of interprocess communication (IPC)

➥ server process

➥ lives ’unlimited’➥ acts as a service provider for an IPC

➥ peer process

➥ short-lived (for the duration of use by the user)

➥ acts as initiator and service provider

Page 30: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 30

Peer-to-Peer Model

➥ Collaboration of peer processes for a distributed activity

➥ each process manages a local part of the resources

➥ distributed coordination and synchronization of actions at

application level

Coordination code

Application

Coordination code

Application

Coordination code

Application

➥ E.g.: file sharing applications, routers, video conferences, ...

Page 31: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 31

[Coulouris, 2.2.2]Client/Server Model

➥ Asymmetric model: Servers provide services that can be used by

(multiple) clients.

➥ servers usually manage resources (centralized)

Request

Reply

Client

Client

Server

Process Computer

RequestServer

Reply

Server can itselfact as a client

➥ Most common model for distributed applications (ca. 80 %)

Page 32: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 32

Client/Server Model ...

➥ Usually concurrent requests from several client processes to theserver process

Start

Server

Client End

ReplyRequest Time

➥ Examples: file server, web server, database server, DNS server,...

Page 33: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 32

Client/Server Model ...

➥ Usually concurrent requests from several client processes to theserver process

Client

Start

Server

Client End

ReplyRequest Time

➥ Examples: file server, web server, database server, DNS server,...

Page 34: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 33

Variants of the client/server model

Server

Server

ServerClient

Client

Cooperating servers

➥ Network of servers transparently processes a request

➥ Example: Domain Name Server (DNS)

➥ if server cannot determine address:request is transparentlyforwarded to another server

➥ Replicated servers

➥ replicas of server processesare provided

➥ transparent replicas (often in clusters)➥ requests are automatically distributed to the servers

➥ public replicas (e.g. mirror servers)

➥ goals: better performance, reliability

Page 35: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 34

Variants of the client/server model ...

Proxy

Client

Client

Server

Server

Proxy-Server / Caches

➥ proxy is a delegate forthe server

➥ task often is cachingof data / results

➥ e.g. web proxy

➥ Mobile code

➥ executable server code migrates to client on request

➥ code is executed by the client

➥ best-known example: JavaScript / Java applets in the WWW

➥ Mobile agents

➥ agent contains code and data, moves through the network andperforms actions on local resources

Page 36: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 35

n-Tier Architectures

➥ Refinements of Client/Server Architecture

➥ Models for distributing an application to the nodes of a distributed

system

➥ Mainly used in information systems

➥ Tier (german: Schicht / Stufe) denotes an independent process

space within a distributed application

➥ process space can, but does not have to, correspond to a

physical host

➥ several process spaces on one computer are possible

Page 37: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 36

The Tier Model

➥ Typical tasks in an information system:

➥ presentation – interface to the user

➥ application logic – actual functionality

➥ data storage – storage of data in a database

➥ The tier model determines:

➥ assignment of tasks to application components

➥ distribution of application components on tiers

➥ Architectures:

➥ 2-tier architectures

➥ 3-tier architectures

➥ 4-or-more-tier architectures

Page 38: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 37

2-Tier Architecture

➥ Client and server tier

➥ No own tier for the application logic

(distribution between clientand server tier varies)

Client tier

Server tier

Presentation

Data storage

Application logic

➥ Advantage: simple, high performance

➥ Disadvantage: difficult to maintain, poorly scalable

Page 39: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 38

3-Tier Architecture

Presentation

Application logic

Data storage

Client tier

Middle tier

Server tier

➥ Standard distribution model for simple web applications:

➥ client tier: web browser for display

➥ middle tier: web server with servlets / JSP / ASP

➥ server tier: database server

➥ Advantages: Application logic centrally administrable, scalable

Page 40: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 39

4-or-more-Tier Architectures

➥ Difference to 3-tier architecture:

➥ application logic distributed across multiple tiers

➥ Motivation:

➥ minimization of complexity (divide and conquer)

➥ better protection of individual application parts

➥ reusability of components

➥ Many distributed information systems have 4-or-more-tier

architectures

Page 41: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 40

Example: Typical Internet Application

Intranet

Internet

Web

Webclient

client

cationserver

serverData

Tier 1 Tier 3 Tier 4Tier 2

Appli−baseserver

Web

Page 42: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 40

Example: Typical Internet Application

Intranet

Internet

Web

Webclient

client

cationserver

serverData

Tier 1 Tier 3 Tier 4Tier 2

Appli−baseserver

Web

DMZ

Fire

wal

l

Fire

wal

l

serverWeb

Page 43: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 41

Thin and fat clients

➥ Characterizes complexity of the application component on the

client tier

➥ Ultra-thin client

➥ client tier only for presentation: pure display of dialogs

➥ presentation component: web browser

➥ only possible with 3-or-more-tier architectures

➥ Thin client

➥ client tier for presentation only: display of dialogs, preparation

of data for display

➥ Fat client

➥ parts of the application logic on the client tier

➥ usually with 2-tier architectures

Page 44: Distributed Systems - Uni Siegen

1.5 Architectural Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 42

Distinction from Enterprise Application Integration (EAI)

➥ EAI: integration of different applications

➥ communication, exchange of data

➥ Goals similar to distributed applications / middleware

➥ middleware is often used for EAI as well

➥ Differences:

➥ distributed applications: application components, high degree

of coupling, usually little heterogeneity

➥ EAI: complete applications, low degree of coupling, mostly

great heterogeneity (different technologies, systems,

programming languages, ...)

Page 45: Distributed Systems - Uni Siegen

1.6 Cluster

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 43

➥ Cluster: group of networked

computers that acts as a unified

computing resource

➥ i.e. multicomputer system

➥ nodes usually standard PCs

or blade server

➥ Application mainly as high

performance server

➥ Motivation:

➥ (step-by-step) scalability

➥ high availability

➥ good price/performance ratio

[Stallings, 13.4]

Page 46: Distributed Systems - Uni Siegen

1.6 Cluster ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 44

Uses for Clusters

➥ High availability (HA) clusters

➥ improved reliability

➥ when a node is faulty: services are migrated to other nodes

(failover)

➥ Load balancing cluster

➥ incoming requests are distributed to different nodes of the

cluster

➥ usually by a (redundant) central instance

➥ frequently with WWW or email servers

➥ High performance computing cluster

➥ cluster as parallel computer

Page 47: Distributed Systems - Uni Siegen

1.6 Cluster ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 45

Cluster configurations

➥ Passive standby (no actual cluster)

➥ processing of all requests by primary server

➥ secondary server takes over tasks (only) in case of failure

➥ Active standby

➥ all servers process requests

➥ enables load balancing and improved reliability

➥ problem: access to data of other / failed server

➥ alternatives:

➥ replication of data (a lot of communication)

➥ shared hard disk system (usually mirrored disks or RAID

system for fail-safe operation)

Page 48: Distributed Systems - Uni Siegen

1.6 Cluster ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 46

Active Standby Configurations

➥ Separate servers with data replication

➥ separate disks, data is continuously copied to secondary

servers

➥ Server with shared hard disks

➥ shared nothing cluster

➥ separate partitions for each server

➥ in case of server failure: reconfiguration of the partitions

➥ shared disc cluster

➥ simultaneous use by all servers

➥ requires lock manager software to lock files or records

Page 49: Distributed Systems - Uni Siegen

1.7 Summary

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 47

➥ Distributed system

➥ HW and SW components on networked computers

➥ no shared memory, no global time

➥ motivation: use of distributed resources

➥ Challenges

➥ heterogeneity, openness, security, scalability

➥ error handling, concurrency, transparency

➥ Software architecture: middleware

➥ Architectural models:

➥ peer-to-peer, client/server

➥ n-tier models

➥ Cluster: high availability, load balancing

Page 50: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 48

Distributed SystemsSummer Term 2021

2 Middleware

Page 51: Distributed Systems - Uni Siegen

2 Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 49

Content

➥ Communication in distributed systems

➥ Communication-oriented middleware

➥ Application-oriented middleware

Literature

➥ Hammerschall: Ch. 2, 6

➥ Tanenbaum, van Steen: Ch. 2

➥ Colouris, Dollimore, Kindberg: Ch. 4.4

Page 52: Distributed Systems - Uni Siegen

2 Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 50

Netw.

Distributed application (DA)

Distributed system (DS)

DADA

DS nodeDS node

component component

➥ DA uses DS for communication between its components

➥ DSs generally only offer simple communication services

➥ direct use: network programming

➥ Middleware offers more intelligent interfaces

➥ hides details of network programming

Page 53: Distributed Systems - Uni Siegen

2 Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 50

Netw.

DA

Middleware

DA

Middlewarecomponent

DS node DS node

Distributed system (DS)

Distributed application (DA)

component

Netw.

Distributed application (DA)

Distributed system (DS)

DADA

DS nodeDS node

component component

➥ DA uses DS for communication between its components

➥ DSs generally only offer simple communication services

➥ direct use: network programming

➥ Middleware offers more intelligent interfaces

➥ hides details of network programming

Page 54: Distributed Systems - Uni Siegen

2 Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 51

➥ Middleware is the interface between distributed application and

distributed system

➥ Goal: hide distribution aspects from application

➥ transparency (☞ 1.3)

➥ Middleware can also provide additional services for applications

➥ huge differences in existing middleware

➥ Distinction:

➥ communication-oriented middleware (☞ 2.2)

➥ (only) abstraction from network programming

➥ application-oriented middleware (☞ 2.3)

➥ besides communication, the focus is on support of

distributed applications

Page 55: Distributed Systems - Uni Siegen

2 Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 52

2.1 Communication in Distributed Systems

➥ Basis: interprocess communication (IPC)

➥ exchange of messages between processes (☞ BS I: 3.2)

➥ on the same or on different nodes

➥ e.g. via ports, mailboxes, streams, ...

➥ For distribution: network protocols (☞ RN I)

➥ relevant topics etc: addressing, reliability, guaranteed ordering,

timeouts, acknowledgements, marshalling

➥ Interface for network programming: sockets (☞ RN II)

➥ datagrams (UDP) and streams (TCP)

Page 56: Distributed Systems - Uni Siegen

2.1 Communication in Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 53

Synchronous Communication

➥ ReceiverSender

bloc

ked

reply

request

activ

e

Time

Sender and receiver block when

calling a send or receive operation

➥ receiver is waiting for a request

➥ sender is waiting for the reply

➥ Tight coupling between sender and

receivers

➥ advantage: easy to understand model

➥ disadvantage: strong dependency, especially in case of error

➥ Prerequisites:

➥ reliable and fast network connection

➥ receiver process is available

Page 57: Distributed Systems - Uni Siegen

2.1 Communication in Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 54

Asynchronous Communication

➥ Receiver

activ

e

Sender

activ

e

request

Time

Sender is not blocked, can continue

immediately after sending the message

➥ Incoming messages are buffered at the

receiver

➥ Answers are optional

➥ receiver can reply asynchronously to

the sender

➥ More complex implementation and use as with synchronous

communication, but usually more efficient

➥ Only loose coupling between the processes

➥ receiver does not have to be ready for reception

➥ less dependent in case of errors

Page 58: Distributed Systems - Uni Siegen

2.1 Communication in Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 55

Client/Server Communication

operation

replymessage

ServerClient

request

messagedeterminerequest

send answer

select object, if needed

execute

(wait)

(continue)

execute method

➥ Mostly synchronous: client blocked until response arrives

➥ Variants: asynchronous (non blocking), one way (without answer)

Page 59: Distributed Systems - Uni Siegen

2.1 Communication in Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 55

Client/Server Communication

getRequest()

sendReply()

do

Op

era

tion

() operation

replymessage

ServerClient

request

messagedeterminerequest

send answer

select object, if needed

execute

(wait)

(continue)

execute method

➥ Mostly synchronous: client blocked until response arrives

➥ Variants: asynchronous (non blocking), one way (without answer)

Page 60: Distributed Systems - Uni Siegen

2.1 Communication in Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 56

Client/Server Communication: Request/Response Protocol

➥ Typical operations:

➥ doOperation() – send request and wait for result

➥ getRequest() – wait for request

➥ sendReply() – send result

➥ Typical message structure:

messageTyperequestIDobjectReferencemethodIDarguments

request / reply ?unique ID of request (usually int)reference to remote object (if needed)method to be called (int / String)arguments (usually as Byte array)

➥ request ID + sender ID result in unique message ID

➥ e.g. to map an answer to its query

Page 61: Distributed Systems - Uni Siegen

2.1 Communication in Distributed Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 57

Client/Server Communication: Error Handling

➥ Request and/or response messages may be lost

➥ Client sets a timeout when sending a request

➥ after expiration, request is usually sent again

➥ after a few repetitions: termination with exception

➥ Server discards duplicate requests if request has already been /

is still being processed

➥ For lost response messages:

➥ idempotent operations can be executed again

➥ otherwise: save results of operations in a history

➥ for repeated request: only resend the result

➥ delete history entries when next request arrives; if

necessary confirmations for results can also be used

Page 62: Distributed Systems - Uni Siegen

2.2 Communication-oriented Middleware

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 58

➥ Focus: provision of a communication infrastructure for distributed

applications

➥ Tasks:

➥ communication

➥ dealing with heterogeneity

➥ error handling

Application

Communication oriented

Operating system / distributed system

middleware

Page 63: Distributed Systems - Uni Siegen

2.2.1 Tasks of the Middleware

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 59

Communication

➥ Provision of a middleware protocol

➥ Localization and identification of communication partners

➥ Integration with process and thread management

Transport protocol (e.g. TCP)

Middleware protocol

Application protocol

Lower layers of the protocol stack

Page 64: Distributed Systems - Uni Siegen

2.2.1 Tasks of the Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 60

Heterogeneity

➥ Problem with data transmission:

➥ heterogeneity in distributed systems

➥ Heterogeneous hardware and operating systems

➥ different byte order

➥ little endian vs. big endian

➥ different character encoding

➥ e.g.. ASCII / Unicode / UTF-8 / EBCDIC (IBM Mainframes)

➥ Heterogeneous programming languages

➥ different representation of simple and complex data types in

the main memory

Page 65: Distributed Systems - Uni Siegen

2.2.1 Tasks of the Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 61

Heterogeneity: Solutions (☞ RN I)

➥ Use of generic, standardized data formats

➥ known to all communication partners and middleware

➥ platform-specific formats for middleware (e.g. CDR for

CORBA) or external formats, e.g. XML

➥ Heterogeneity of hardware and operating system

➥ is handled transparently for the applications by the middleware

➥ Heterogeneity of programming languages

➥ applications need to convert data to higher-level format and

back (marshaling / unmarshaling)

➥ necessary code is usually generated automatically

➥ client stub / server skeleton

Page 66: Distributed Systems - Uni Siegen

2.2.1 Tasks of the Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 62

Error Handling

➥ Possible errors due to distribution

➥ incorrect transmission (incl. loss of messages)

➥ handled by the protocols of the distributed system:

➥ checksums, CRC➥ retransmission of packets (e.g. TCP)

➥ failure of components (network, hardware, software)

➥ handled by middleware or application:

➥ acceptance of the error➥ retransmission of messages

➥ replication of components (error avoidance)

➥ controlled termination of the application

Page 67: Distributed Systems - Uni Siegen

2.2 Communication-oriented Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 63

2.2.2 Programming Models

➥ Programming model defines two concepts:

➥ communication model

➥ synchronous vs. asynchronous

➥ programming paradigm

➥ object-oriented vs. procedural

➥ Three common programming models for middleware:

➥ message-oriented model (asynchronous / arbitrary)

➥ remote procedure call (synchronous / procedural)

➥ remote method invocation (synchronous / object-oriented)

Page 68: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 64

Message-Oriented Model

➥ Sender puts message in receiver’s queue

Sender

Message

Message queue

Message

Receiver

➥ Receiver accepts message as soon as he is ready

➥ Extensive decoupling of transmitter and receiver

➥ No method or procedure calls

➥ data is packed and sent by the application

➥ no automatic reply message

Page 69: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 65

Remote Procedure Call (RPC)

➥ Allows a client to call a procedure in a remote server process

...P(a) {

return b;}

y = P(x);Input parameters

processClient

processServer

Results

➥ Communication according to request / response principle

Remote Method Invocation (RMI)

➥ Allows an object to call methods of a remote object

➥ In principle very similar to RPC

Page 70: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 66

Common Basic Concepts of Remote Calls

➥ Client and server are decoupled by interface definition

➥ defines names of calls, parameters and return values

➥ Introduction of client stubs and server skeletons as an access

interface

➥ are automatically generated from interface definition

➥ IDL compiler (IDL = interface definition language)

➥ are responsible for marshaling / unmarshaling

as well as for the actual communication

➥ realize access and location transparency

Page 71: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 67

How Client Stub and Server Skeleton Work (RPC)

Client stub Server skeleton

P(a) {y=P(x)

...P(a) {

return b;}

; ;

Client process

return b;}

receive(m1);

client=sender(m1);

unpack argument xfrom message

y = P(x)

}

pack argument ainto message

send(Server, m1);

receive(Server, m2)

unpack result bfrom message

while (true) {

send(Client, m2);

pack result y

Server process

into message

Page 72: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 68

Basis of RMI: The Proxy Pattern

➥ Client works with a deputy object (proxy) of the actual server

object

➥ proxy and server object implement the same interface

➥ client only knows / uses this interface

Client Proxy Object

Interface<<interface>>

Page 73: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 69

Flow of a Remote Method Call

Proxy

Skeleton callsthe samemethod onthe object

Client−OS

Client

Network

Server−BS

Server

Skeleton

Server nodeClient node

Object

Status

Method

Same interfaceas real object

Interface

Client callsa method

Packed request is sent over the network(object ID, method name, parameters)

Page 74: Distributed Systems - Uni Siegen

2.2.2 Programming Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 70

Creation of a Client/Server Program

Server

Client

Compiler

Compiler

Client stubs

IDL compiler

Server skel.

RuntimeRPC/RMI

Serverprocedures

Client

library

Interfacedescription

program

➥ Applies in principle to all realizations of remote calls

Page 75: Distributed Systems - Uni Siegen

2.2 Communication-oriented Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 71

2.2.3 Middleware Technologies

➥ Realize (at least) one of the programming models

➥ rely on open standards / standardized interfaces

➥ Remote procedure call

➥ SUN RPC, DCE RPC, Web Services (☞ CSP: 7), ...

➥ Remote method invocation

➥ Java RMI (☞ 3), CORBA (☞ CSP: 3), ...

➥ Message-oriented middleware technologies

➥ MOM: message oriented middleware, messaging systems

➥ mainly for EAI

➥ Java Message Service, WebSphereMQ (MQSeries), ...

Page 76: Distributed Systems - Uni Siegen

2.2 Communication-oriented Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 72

2.2.4 Message Oriented Middleware (MOM)

➥ Middleware technology for the message-oriented model

➥ In addition to message exchange also other services, especially

queue management

interfaceAccess

interfaceAccess

Sender ReceiverMessage queues

Message queuemanager

Protocol stack

Middleware protocol (proprietary)

Page 77: Distributed Systems - Uni Siegen

2.2.4 Message Oriented Middleware (MOM) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 73

Message Queue Infrastructure

➥ Access to queues is only possible locally

➥ local: same computer or same subnet

➥ Transport of messages across subnet boundaries by queue

administrators (routers)

Manager Manager

Manager

Sender Receiver

ReceiverSender

Page 78: Distributed Systems - Uni Siegen

2.2.4 Message Oriented Middleware (MOM) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 74

Variants of message exchange

➥ Point-to-point communication

➥ communication between two defined processes

➥ simplest model: asynchronous communication

➥ enhancement: request/reply model

➥ enables synchronous communication via asynchronousmiddleware

➥ Broadcast communication

➥ Message is sent to all reachable receivers

➥ one implementation: publish/subscribe model

➥ publishers publish messages/news on a topic

➥ subscribers subscriber to certain topics

➥ mediation via a broker

Page 79: Distributed Systems - Uni Siegen

2.2.4 Message Oriented Middleware (MOM) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 75

Example: Java Message Service

➥ Part of the Java Enterprise Edition (Java EE)

➥ Unified Java interface for MOM services

➥ Distinguishes two roles:

➥ JMS provider: the respective MOM server

➥ JMS client: sender or receiver of messages

➥ JMS supports:

➥ asynchronous point-to-point communication

➥ request/reply model

➥ publish/subscribe model

➥ JMS defines corresponding access objects and methods

Page 80: Distributed Systems - Uni Siegen

2.2 Communication-oriented Middleware ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 76

2.2.5 Summary

➥ Tasks: Communication, dealing with heterogeneity, error handling

➥ Programming models:

➥ message-oriented model (asynchronous)

➥ basis: message queues

➥ refinements:➥ request/reply model (synchronous)

➥ publish/subscribe model (broadcast)

➥ remote procedure or method calls

➥ synchronous: request and response

➥ generated stubs for (un-)marshaling

Page 81: Distributed Systems - Uni Siegen

2.3 Application-oriented Middleware

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 77

➥ Based on communication-oriented middleware

➥ Extends it by:

➥ runtime environment

➥ services

➥ component model

Runtime environment ServicesServices

Component model

Communication infrastructure

Operating system / distributed system

componentApplication

component componentApplication Application

Page 82: Distributed Systems - Uni Siegen

2.3.1 Runtime environment

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 78

➥ Based on node operating systems of the distributed system

➥ Operating system (OS) manages processes, memory, I/O, ...

➥ provides basic functionality

➥ starting / stopping processes, scheduling, ...

➥ interprocess communication, synchronization, ...

➥ Runtime environment extends functionality of the OS:

➥ improved resource management

➥ e.g. concurrency, connection management

➥ improved availability

➥ improved security mechanisms

Page 83: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 79

Resource management

➥ Middleware goes beyond simple OS functionality

➥ e.g. independently managed main memory areas with

individual security criteria

➥ pooling of processes, threads, connections

➥ are created for stock and made available as required

➥ possible, since middleware is specific to certain classes of

applications

➥ Goal: improved performance, scalability and availability

Page 84: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 80

Concurrency

➥ Concurrency in this context:

➥ isolated parallel processing of requests

➥ Concurrency can be implemented via processes or threads

➥ threads (lightweight processes): concurrent activities within

processes

➥ threads in the same process share all resources

➥ advantages and disadvantages:

➥ processes: high resource requirements, not well scalable,

good protection, with low concurrency

➥ threads: well scalable, no mutual protection, with high

concurrency

Page 85: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 81

Concurrency ...

➥ Middleware takes over automatic generation / administration ofthreads in the case of concurrent orders, e.g.

➥ single-threaded

➥ only one thread, sequential processing

➥ thread-per-request

➥ a new thread is created for each request

➥ thread-per-session

➥ a new thread is created for each session (client)

➥ thread pool

➥ fixed number of threads, incoming requests are distributedautomatically

➥ saves thread generation costs➥ limits resource consumption

Page 86: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 82

Connection management

➥ Connection here means: endpoints of communication channels

➥ occur at tier boundaries (between process spaces)

➥ e.g. client/server interface, database access

➥ are assigned to a process/thread, if in the active state

➥ require resources (memory, processor time)

➥ opening and closing connections is costly

➥ To save resources: pooling of connections

➥ connections are initialized to stock and placed in pool

➥ each thread/process receives a connection on demand

➥ after use: return connection to pool

Page 87: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 83

Availability

➥ Requirement to the application,

but mainly implemented by the runtime environment

➥ Downtimes are caused by

➥ failure of a hardware or software component

➥ overload of a hardware or software component

➥ maintenance of a hardware or software component

➥ Frequent technology for ensuring availability: cluster

➥ replication of hardware and software

➥ cluster appears externally as one unit

➥ two types: fail-over cluster / load-balancing cluster

Page 88: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 84

Security

➥ Distributed applications are vulnerable due to their distribution

➥ Middleware supports different security models

➥ Security requirements:

➥ authentication:

➥ proves the identity of the user / a component

➥ e.g. by password query (for users) or cryptographic

techniques and certificates (for components)

➥ authorization:

➥ definition of access rights for users to specific services

➥ or more fine grained: methods and attributes

➥ requires secure authentication

Page 89: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 85

Security ...

➥ Security requirements ...:

➥ confidentiality

➥ information cannot be intercepted during transmission in

the network➥ technique: encryption

➥ integrity

➥ transmitted data cannot be changed without being noticed

➥ techniques: cryptographic checksum (message digest,

fingerprint), digital signature

➥ digital signature also ensures authenticity of the sender

Page 90: Distributed Systems - Uni Siegen

2.3.1 Runtime environment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 86

Security ...

➥ Security mechanisms:

➥ encryption

➥ symmetric (e.g. IDEA, AES)➥ same key for encryption and decryption

➥ asymmetric (public key algorithms, e.g. RSA)➥ public key for encryption➥ private key for decrypting

➥ digital signature

➥ ensures integrity of a message and authenticity of thesender as well as nonrepudiation

➥ certificate➥ certifies that public key and person (or component) belong

together

Page 91: Distributed Systems - Uni Siegen

2.3.2 Services

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 87

Name service (directory service) (☞ 4)

➥ Publication of available services

➥ in the intranet or Internet

➥ Assignment of names to references (addresses)

➥ name serves as a unique / unchangeable identifier

➥ the client can request the address of a service via its name

➥ address can change e.g. at restart

➥ goal: decoupling of client and server

➥ Examples: JNDI, RMI registry, CORBA interoperable naming

service, UDDI registry, LDAP server, ...

Page 92: Distributed Systems - Uni Siegen

2.3.2 Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 88

Session management

➥ In interactive systems: each instance of a client is assigned itsown session

➥ deleted when logging out or closing the client

➥ Session stores all relevant data (in main memory)

➥ e.g. identification of the user, browser type, ”‘shopping cart”’, ...

➥ data stored in the server or in the client

➥ transient data: deleted at the end of the session

➥ persistent data: is written to a data carrier (database) at theend of the session.

➥ Middleware implements/supports the assignment of requests tosessions (often transparent)

➥ e.g. cookies, HTTP-sessions, session beans, ...

Page 93: Distributed Systems - Uni Siegen

2.3.2 Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 89

Transaction management (☞ 7.4)

➥ Service for interactive, data-centric applications

➥ consistency / integrity of data is important

➥ this means that the entire (maybe distributed) dataset must

represent a valid state in itself

➥ Typical sequence in applications:

1. client requests data

2. client changes the data

3. client requests that the data be rewritten

➥ problem: steps 1-3 could be performed by two clients at the

same time

➥ Transaction management allows execution of a sequence ofactions as an atomic unit

Page 94: Distributed Systems - Uni Siegen

2.3.2 Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 90

Persistence service

➥ Persistence: all measures for the permanent storage of main

memory data

➥ Persistence service: intelligent interface to the database

➥ integrated in middleware or as an independent component

➥ most important service for data-centered applications besides

transaction management

➥ Most common type: object-relational mapper (OR-Mapper)

➥ maps objects in the main memory to tables in a relational

database

➥ mapping rules are defined by application developers

Page 95: Distributed Systems - Uni Siegen

2.3.2 Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 91

Persistence service ...

Var5Var4Var3Var2Var1Table B

Var1 Var2 Var3 Var4Table A

Var1Var2Var3

Object A Var1Var2

Object B

Var1Var2

Object C

OR mapper

Mai

n m

emor

yD

ata

base

Page 96: Distributed Systems - Uni Siegen

2.3.3 Component model

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 92

➥ Components: “large” objects for structuring applications

➥ A component model defines:

➥ the term “component”

➥ structure and properties of the components

➥ mandatory and optional interfaces

➥ interface contracts

➥ how do components interact with each other and with the

runtime environment?

➥ component runtime environment

➥ management of the life cycle of components

➥ implicit provision of services: component only specifies its

requirements (e.g. persistence)

Page 97: Distributed Systems - Uni Siegen

2.3.4 Middleware Technologies

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 93

➥ Object request broker (ORB)

➥ distributed objects, remote method calls

➥ variety of services, only basic runtime environment

➥ example: CORBA

➥ Application server

➥ focus: support of application logic (middle tier)

➥ services, runtime environment, and component model

➥ today only as part of a middleware platform

➥ Middleware platforms

➥ extension of application servers: support of all tiers

➥ distributed applications as well as EAI

➥ examples: Java EE/EJB, .NET/COM, CORBA 3.0/CCM

Page 98: Distributed Systems - Uni Siegen

2.3.5 Summary

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 94

Application-oriented middleware

➥ Runtime environment

➥ resource management, availability, security

➥ Services

➥ name service, session management, transaction

management, persistence service

➥ Component model

➥ defintion of components, interface contracts, runtime

environment

Page 99: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 95

Distributed SystemsSummer Term 2021

3 Distributed Programming with Java RMI

Page 100: Distributed Systems - Uni Siegen

3 Distributed Programming with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 96

Content

➥ Introduction

➥ Hello World with RMI

➥ RMI in detail

➥ classes and interfaces, stubs, name service, parameter

passing, factories, callbacks, ...

➥ Deployment: loading remote classes

➥ Java remote class loader and security manager

Page 101: Distributed Systems - Uni Siegen

3 Distributed Programming with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 97

Literature

➥ WWW documentation and tutorials from Oracle

➥ http://docs.oracle.com/javase/6/docs/api/index.html?

java/rmi/package-summary.html

➥ http://www.oracle.com/technetwork/java/javase/tech/

index-jsp-136424.html

➥ Hammerschall: Ch.. 5.2

➥ Farley, Crawford, Flanagan: Ch. 3

➥ Horstmann, Cornell: Ch. 5

➥ Orfali, Harkey: Ch. 13

➥ Peter Ziesche: Nebenlaufige & verteilte Programmierung,

W3L-Verlag, 2005. Ch. 8

Page 102: Distributed Systems - Uni Siegen

3.1 Introduction

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 98

➥ Java RMI is an integral part of Java

➥ allows use of remote objects

➥ Elements of Java RMI:

➥ remote object implementations

➥ client interfaces (stubs) to remote objects

➥ server skeletons for remote object implementations

➥ name service to locate objects in the network

➥ service for automatically creating (activating) objects

➥ communication protocol

➥ Java interfaces for the first five elements

➥ in the package java.rmi and its subpackages

Page 103: Distributed Systems - Uni Siegen

3.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 99

➥ Java RMI requires that all objects (i.e., client and server) are

programmed in Java.

➥ in contrast to, e.g., CORBA

➥ Advantage: seamless integration into the language

➥ use of remote objects is (almost!) identical to local objects

➥ including distributed garbage collection

➥ Integration of objects in other programming languages:

➥ “wrapping” in Java code via Java Native Interface (JNI)

➥ use of RMI/IIOP: interoperability with CORBA

➥ direct communication between RMI and CORBA objects

Page 104: Distributed Systems - Uni Siegen

3.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 100

Distributed Objects

local reference

JVM 1 JVM 2 JVM 3

Node 1 Node 2

➥ Remote references can be used just like local references

➥ Objects can occur in client and server roles

Page 105: Distributed Systems - Uni Siegen

3.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 100

Distributed Objects

remote referencelocal reference

JVM 1 JVM 2 JVM 3

Node 1 Node 2

➥ Remote references can be used just like local references

➥ Objects can occur in client and server roles

Page 106: Distributed Systems - Uni Siegen

3.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 101

3.1.1 RMI ArchitectureR

MI s

yste

m

Client Server

Remote Remote

SkeletonStub

Remote reference

Stub / skeletonlayer

layer

RMI transport layer

referencemanager

referencemanager

Page 107: Distributed Systems - Uni Siegen

3.1.1 RMI Architecture ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 102

Stub/Skeleton Layer

➥ Stub: local proxy object for the remote object

➥ Skeleton: receives calls and forwards them to the correct object

➥ Stub and skeleton classes are automatically generated from an

interface definition (Java interface)

➥ As of JDK 1.2: skeleton class is generic

➥ skeleton uses reflection mechanism of Java to call methods of

server object

➥ reflection allows you to query the method definitions of a class

and to generically call methods at runtime

➥ As of JDK 1.5: stub classes are created at runtime

➥ with the Java class Proxy

Page 108: Distributed Systems - Uni Siegen

3.1.1 RMI Architecture ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 103

Remote Reference Layer

➥ Defines call semantics of RMI

➥ in JDK 1.1: unicast only, point-to-point

➥ call is routed to exactly one existing object

➥ as of JDK 1.2 also activatable objects

➥ object will be (re-)activated first, if necessary

➥ new object, state is restored from hard disk

➥ also possible: multicast semantics

➥ proxy sends request to a set of objects and returns the first

response

➥ Also: connection management, distributed garbage collection

Page 109: Distributed Systems - Uni Siegen

3.1.1 RMI Architecture ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 104

Transport Layer

➥ Connections between JVMs

➥ basis: TCP/IP streams

➥ Proprietary protocol: Java Remote Method Protocol (JRMP)

➥ allows tunneling the connection via HTTP (due to firewalls)

➥ allows you to define your own socket factory, e.g. to use

Transport Layer Security (TLS or SSL)

➥ As of JKD 1.3 also RMI-IIOP

➥ uses IIOP (Internet Inter-ORB Protocol) from CORBA

➥ thus: direct interoperability with CORBA objects

Page 110: Distributed Systems - Uni Siegen

3.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 105

3.1.2 RMI Services

➥ Name service: RMI Registry

➥ registers remote references to RMI objects under freely

selectable unique names

➥ a client can then get the corresponding reference for a name

➥ technical: registry sends serialized proxy object (client

stub) to the client.

➥ the location of the required class files may also be

transferred (see 3.4.1)

➥ RMI can also be used with other naming services, e.g. via

JNDI (Java Naming and Directory Interface)

Page 111: Distributed Systems - Uni Siegen

3.1.2 RMI Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 106

➥ Object Activation Service

➥ usually: remote reference to RMI object is only valid as long as

the object exists

➥ if the server or the server JVM crashes: object references

become invalid➥ references change on restart!

➥ RMI Activation Service introduced with JDK 1.2

➥ starts server objects on request of a client

➥ server object must register an activation method with the

RMI Activation Daemon

Page 112: Distributed Systems - Uni Siegen

3.1.2 RMI Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 107

➥ Distributed Garbage Collection

➥ automatic garbage collection of Java also works for remote

objects

➥ server-side JVM manages a list of remote references to

objects

➥ references are “leased” for a certain time

➥ reference counter of the object is decremented, if

➥ client deletes the reference (e.g., end of the lifetime of the

reference variable), or

➥ client does not renew the lease in time➥ reason: remote reference layer cannot explicitly “log off”

an object, if the client crashes➥ default setting: 10 min.

Page 113: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 108

Structure:

Client JVM Server JVM

class HelloClient {

s = h.sayHello();

Client class

...

Hello h;...

...

Server classclass HelloServer

String sayHello() {return "Hello World";

}...

Interface

interface Hello {String sayHello();

}

implements Hello {

Page 114: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 109

Development Process:

1. Design the interface for the server object

2. Implement the server class

3. Develop the server application to include the server object

4. Develop the client application with calls to the server object

5. Compile and start the system

Page 115: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 110

Designing the Interface for the Server Object

➥ Specified as normal Java interface

➥ Must extend java.rmi.Remote

➥ no inheritance of operations, only marking as remote interface

➥ Each method must declare to raise the exception

java.rmi.RemoteException (or a base class of it)

➥ base class for all errors that may occur

➥ in the client, during transmission, in the server

➥ No restrictions compared to local interfaces

➥ but: semantic differences (parameter passing!)

Page 116: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 111

Hello-World Interface

RemoteExceptionindicates error in theremote object orduring communication

Marker interface,contains no methods,marks interface asRMI interface

public interface Hello extends Remote {

String sayHello() throws RemoteException;

import java.rmi.RemoteException;

import java.rmi.Remote;

}

Page 117: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 112

Implementing the Server Class

➥ A class that is to be usable remotely must:

➥ implement a given remote interface

➥ usually extend java.rmi.server.UnicastRemoteObject

➥ defines call semantics: point-to-point

➥ have a constructor that declares to throw a RemoteException

➥ creation of object must be done in a try-catch block

➥ Methods usually do not need to specify throws RemoteException

➥ because they don’t throw the exception themselves

Page 118: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 113

Hello-World Server (1)

Remote method

import java.rmi.*;

import java.rmi.server.UnicastRemoteObject;

public class HelloServer extends UnicastRemoteObject

implements Hello {

super();

}

return "Hello World!";

}

public HelloServer() throws RemoteException {

public String sayHello() {

Page 119: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 114

Development of the Server Application to Include the ServerObject

➥ Tasks:

➥ creating a server object

➥ registering the object with the name service

➥ under a specified public name

➥ Typically not a new class, but main method of the server class

Page 120: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 115

Hello-World Server (2)

server object

Create the Register the server object

under the name "Hello−Server"

with the name server (RMI registry,local host, port 1099)

public static void main(String args[]) {

try {

HelloServer obj = new HelloServer();

catch (Exception e) {

System.out.println("Error: " + e.getMessage());

e.printStackTrace();

}

}

}

Naming.rebind("rmi://localhost/Hello−Server", obj);

}

Page 121: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 116

Development of the Client Application with Calls to the ServerObject

➥ Client must first use the name service to get a reference to the

server object from the name service

➥ type cast to the correct type required

➥ Then: any method can be called

➥ no syntactical differences to local calls

➥ Note: client can get remote references in other ways as well

➥ e.g. as return value of a remote method

Page 122: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 117

Hello-World Client

Get object referencefrom name server

Call the methodon the remote object

public static void main(String args[]) {public class HelloClient {

try {

import java.rmi.*;

(Hello)Naming.lookup("rmi://bspc02/Hello−Server");

String message = obj.sayHello();System.out.println(message);}catch (Exception e) {...

}}}

Hello obj =

Page 123: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 118

Compiling and Starting the System

➥ Compiling the Java sources

➥ source files: Hello.java, HelloServer.java,

HelloClient.java

➥ invocation: javac *.java

➥ creates Hello.class, HelloServer.class,HelloClient.class

➥ Creating the client stub (proxy object)

➥ for clients up to JDK 1.4:

➥ invocation: rmic -v1.2 HelloServer

➥ creates HelloServer Stub.class

➥ as of JDK 1.5: client creates proxy class at runtime

➥ using java.lang.reflect.Proxy

Page 124: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 119

Compiling and Starting the System ...

Client side Server sideup

to J

DK

1.4

HelloServer.java

HelloClient.class Hello.class Hello.class HelloServer.class

javac javac

HelloClient.java Hello.java

HelloServer_Stub.class

rmic

Page 125: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 120

Compiling and Starting the System ...

➥ Starting the naming service

➥ invocation: rmiregistry [port]

➥ for security reasons, objects can only be registered on the

local host

➥ i.e. RMI registry must run on server computer

➥ standard port: 1099

➥ Starting the server

➥ invocation: java HelloServer

➥ Starting the client

➥ invocation: java HelloClient

Page 126: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

HelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 127: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

Name and

(serialized)

Naming.rebind(...)

objectstubHelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 128: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

Hello server

Name and

(serialized)

Naming.rebind(...)

objectstubHelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 129: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

Naming.lookup(...)

NameHello server

HelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 130: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

(serialized)objectstub

Naming.lookup(...)

NameHello server

HelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 131: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

objectStub

(serialized)objectstub

Naming.lookup(...)

NameHello server

HelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 132: Distributed Systems - Uni Siegen

3.2 Hello World with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 121

Execution of the Example

obj.sayHello()

objectStub

Hello server

HelloClient

rmiregistryTestFoo

Remoteobject

HelloServer

Clientcomputer

Servercomputer

Page 133: Distributed Systems - Uni Siegen

3.3 RMI in Detail

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 122

3.3.1 Classes and Interfaces

uses

java.lang java.io

java.rmi.serverjava.rmi

HelloServer

RemoteServer

RemoteStub

HelloClient

RemoteObject

Object IOException

Remote<<interface>>

UnicastRemoteObject

RemoteException

Naming

Page 134: Distributed Systems - Uni Siegen

3.3.1 Classes and Interfaces ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 123

Interface Remote

➥ Every remote object must implement this interface

➥ Does not provide methods, serves only as a marker

Class RemoteException

➥ Superclass for all exceptions that can be triggered by the RMI

system, for example, with

➥ communication errors (server not reachable, ...)

➥ (un-)marshalling errors

➥ protocol errors

➥ Each remote method must specify RemoteException (or a base

class of it) in the throws clause

Page 135: Distributed Systems - Uni Siegen

3.3.1 Classes and Interfaces ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 124

Class RemoteObject

➥ Base class for all remote objects

➥ Redefines the methods equals, hashCode, and toString

➥ toStub() returns a reference to the stub object

➥ getRef() returns remote reference (= Java class)

➥ used by the stub to call methods

Class RemoteServer

➥ Base class for all server implementations

➥ UnicastRemoteObject, Activatable

➥ Method getClientHost(): host address of the client of the

current RMI call

➥ setLog() and getLog(): logging of RMI calls

Page 136: Distributed Systems - Uni Siegen

3.3.1 Classes and Interfaces ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 125

Class UnicastRemoteObject

➥ Implements remote object with the following properties:

➥ references to the object are only valid as long as server

process (JVM) is still running

➥ client call is routed to exactly one object (via TCP connection),

no replication

➥ Constructor allows definition of port and socket factories

➥ so that e.g. connections via TLS/SSL can be realized

➥ Static method exportObject() makes object available via RMI

➥ Static method unexportObject() cancels availability

Class RemoteStub

➥ Base class for all client stubs

Page 137: Distributed Systems - Uni Siegen

3.3.1 Classes and Interfaces ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 126

Class Naming

➥ Allows easy access to RMI registry

➥ Important methods:

➥ bind() / rebind(): registers object under given name

➥ lookup(): get object reference to a name

➥ Names are given in URL format

➥ also define the host and port of the RMI registry.

➥ structure of the URL:

Port of the RMI registryName of the registered Object

Host of RMI registryProtocol (always rmi)

rmi:// bspc02:1234/Hello

Page 138: Distributed Systems - Uni Siegen

3.3 RMI in Detail ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 127

3.3.2 Special Features of Remote Classes

➥ Comparison of remote objects

➥ Comparison with == refers only to the stub objects

➥ Result is false, even if both stubs refer to the same remote

object

➥ comparison with equals() returns true if both stubs refer to

the same remote object

JVM1

stub1

stub2 stub1.equals(stub2)

stub1 != stub2

JVM2

objectRemote

Page 139: Distributed Systems - Uni Siegen

3.3.2 Special Features of Remote Classes ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 128

➥ Method hashCode()

➥ used by container classes HashMap, HashSet and others

➥ Hash code is calculated only from the object identifier of the

remote object

➥ same remote object ⇒ same hash code

➥ but the content of the object is ignored

➥ consistent with behavior of equals()

➥ Cloning objects

➥ cloning of the remote object is not possible by calling clone()

on the stub

➥ cloning of stubs neither necessary nor meaningful

Page 140: Distributed Systems - Uni Siegen

3.3 RMI in Detail ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 129

3.3.3 Parameter Passing

➥ Parameters passed to remote methods

➥ either via call-by-value

➥ or via call-by-reference

➥ The mechanism used depends on the type of the parameter

➥ Final decision may only be made at runtime!

➥ The return of the result follows the same rules as for parameter

passing

Page 141: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 130

Parameter Passing for Local Methods

➥ Java supports two kinds of types:

➥ value types: simple data types

➥ boolean, byte, char, short, int, long, float, double

➥ are passed to local methods by value

➥ that is, the method receives a copy of the value

➥ reference types: classes (incl. String and arrays)

➥ are passed to local methods by reference

➥ that is, the method works on the original object

and can also change object if required

Page 142: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 131

Parameter Passing for Remote Methods

➥ Value types: are always passed by value

➥ Reference types: dependent on the concrete object

➥ object can be serialized: call-by-value

➥ object belongs to a class that implements the Remote interface:

call-by-reference

➥ neither: error (java.rmi.MarshalException)

➥ both: ??! (this case is to be avoided!)

➥ decision is made only at runtime

Page 143: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 132

Call-by-Value (Serializable Objects)

➥ Class must implement interface java.io.Serializable

➥ Serializable objects can be transferred over a network

➥ only the data is transferred, the code (class file) must be

available at the receiver!

➥ Default serialization of Java:

➥ all attributes of the object are serialized and transferred

➥ recursive procedure!

➥ prerequisite: all attributes and all base classes can be

serialized

➥ Application specific serialization is possible:

➥ implement the methods writeObject and readObject

Page 144: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 133

Passing a Serializable Object

param

Original

Clientobject param Stub

objectServerobject

Independentcopy

<<create>>

op(param)

Skele−ton

Networkconnection

op(param)

m()

paramserialize

paramdeserialize

Page 145: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 134

Call-by-Reference (Remote Objects)

➥ Class of the parameter object must implement an interface that

extends Remote

➥ parameter type must be this interface

➥ class is typically derived from UnicastRemoteObject

➥ A serialized stub object is transferred

➥ from JDK 1.5: stub class is created dynamically

➥ (up to JDK 1.4 stub class must be generated by rmic and be

available at the server)

➥ If the server calls methods on the parameter object:

➥ calls are routed to the original object using RMI

Page 146: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 135

Passing a Remote Object

paramstub

Stubobject

paramClientobject

paramstub

Serverobject

<<create>>

op(paramStub)

Skele−ton

Networkconnection

op(param)

toStub(param)

paramStub

m()

serializeparamStub

paramStubdeserialize

Page 147: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 136

Examples

➥ See WWW:

➥ Hello-World with call-by-value parameter

➥ Hello-World with call-by-reference parameter

Page 148: Distributed Systems - Uni Siegen

3.3.3 Parameter Passing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 137

Arrays and Container Objects

➥ Arrays and container objects (from the Java Collection

Framework, java.util) can be serialized

➥ i.e., they will be reinstantiated at the receiver

➥ To the elements of the array / container the same rules apply as

to simple parameters

➥ for mixed content: elements are passed by value or by

reference depending on their actual class

Page 149: Distributed Systems - Uni Siegen

3.3 RMI in Detail ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 138

3.3.4 Remote Object References as Results

➥ Frequently: via RMI registry, the client receives a reference to a

remote object, which provides references to other objects

➥ the remote object may also create these objects on demand

(this is called factory object or factory class)

➥ Example: server for bank accounts

➥ registration of all account objects with RMI registry not useful

➥ instead: registration of a manager object that returns the

reference to the account object for a given account number

➥ if necessary, it can create a new object (from a database)

➥ Note: RMI does not allow remote object creation

➥ client cannot create objects on a remote host

Page 150: Distributed Systems - Uni Siegen

3.3 RMI in Detail ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 139

3.3.5 Client Callbacks

➥ Frequently: server wants to make calls in the client

➥ e.g. progress bar, queries, ...

➥ For this: client object must be an RMI object

➥ pass this reference to the server method

➥ In some cases, you cannot inherit from UnicastRemoteObject,

e.g. for applets

➥ then: export the object using

UnicastRemoteObject.exportObject(obj,0);

➥ attention: when calling exportObject(obj), no dynamic stub

is created, even with JDK 1.5 and later

➥ Example code: see WWW (Hello-World with callback)

Page 151: Distributed Systems - Uni Siegen

3.3 RMI in Detail ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 140

3.3.6 RMI and Threads

➥ RMI does not specify how many threads are provided on the

server side for method calls

➥ only one thread, one thread per call, ..

➥ This means that several server methods can be active at the

same time

➥ requires correct synchronization (synchronized)!

➥ Client-side locking of a remote object using a synchronized block

is not possible

➥ only local stub is locked

➥ a lock must be implemented using methods of the remote

object if necessary

Page 152: Distributed Systems - Uni Siegen

3.4 Deployment

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 141

➥ Deployment: distribution, transfer and installation of the

components of a distributed application

➥ specifically for RMI: which class file has to go where?

➥ Server, RMI registry and client need the class files for:

➥ the remote interface of the server

➥ all classes or interfaces that are used in the server interface

(recursively)

➥ (up to JDK 1.4 also the stub classes for all used remote

interfaces)

Page 153: Distributed Systems - Uni Siegen

3.4 Deployment ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 142

➥ Client and server additionally need the class files for:

➥ their own implementation

➥ all classes of serializable objects that they receive

➥ as a parameter or a result of method calls

➥ (up to JDK 1.4 also the stub classes for all remote objects they

receive)

➥ Problems with static installation of class files for serialized

objects (and stubs):

➥ dependency between client and server

➥ method parameters, result objects

➥ change of classes requires new installation

➥ nullifies an advantage of distributed applications

Page 154: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 143

Class loader

➥ Class loaders are used for loading classes (and interfaces) at

runtime

➥ more exactly: for loading class files

➥ Each class is loaded only once

➥ Class loaders are Java objects themselves

➥ base class: java.lang.ClassLoader

➥ RMI uses its own class loader

➥ java.rmi.RMIClassLoader

Page 155: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 144

Remote Loading of Classes

➥ RMIClassLoader allows to load classes also from remote

computers

➥ via HTTP (web server) or FTP

➥ URL is defined via codebase property when the JVM is started

➥ Allows central storage of the necessary files

➥ “automatic” deployment

➥ Restrictions:

➥ all classes named in the client code must be available locally

➥ client must define its own security manager

Page 156: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 145

Example

main()

<<interface>>BankServer

getDate(): ...getAmount(): ...getText(): ...

UnicastRemoteObject

BankServerImpl

<<interface>>

Serializable

Entry<<interface>>

BankClient

EntryImpl AccountImpl

getAccount(...): Account

{Remote}

getStatement(...): Entry

{Remote}Account

➥ The class files for BankServer, Account and Entry must beavailable locally at the client (BankClient)

➥ EntryImpl can be remotely loaded by client

Page 157: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 146

Local and Remote Loadable Classes

➥ Local loading (via CLASSPATH) must be possible (for client and

server) for:

➥ all classes (and interfaces) named in the client/server code,

all classes mentioned by name in those classes, ..

➥ i.e. everything that is needed to compile the code

➥ Remotely loadable:

➥ stub classes of remote objects

➥ subclasses accessed only via polymorphism

➥ i.e. the code only uses a superclass or interface

➥ The RMI registry can load all required classes remotely

Page 158: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 147

Example: Hello-World with Callback and Result Object

➥ Interfaces (see WWW):

public interface Hello extends Remote{

HelloObj getHello(AskUser ask) throws RemoteException;}

public interface AskUser extends Remote{

boolean ask(String question) throws RemoteException;}

public interface HelloObj{

void sayIt();}

Page 159: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 148

Example: How are the Classes Loaded?

➥ Interfaces Hello.class, AskUser.class, HelloObj.class

➥ must be available locally at the client

➥ can be loaded remotely by the RMI registry

➥ Implementation HelloObjImpl.class of HelloObj

➥ can be loaded remotely by the client

➥ is not required by RMI registry

➥ Stub classes for the two remote interfaces

➥ are usually generated dynamically (not loaded) as of JDK 1.5

➥ but can also be loaded remotely

Page 160: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 149

Example: Necessary Changes in the Client

➥ Using the RMI security manager:

public static void main(String args[]) {

System.setSecurityManager(new RMISecurityManager());

➥ Definition of security policy: policy file:

grant {permission java.net.SocketPermission "myserver:1024-",

"connect,accept";

permission java.net.SocketPermission "www.bsvs.de:80",

"connect";};

➥ grants local classes (client!) the following permissions:

➥ connection to/from myserver on non-privileged ports:➥ RMI registry (1099), server and callback object (dyn.)

➥ connection to the web server (port 80)

Page 161: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 150

Example: Deployment

➥ All classes to be loaded remotely are packed into one archive

➥ The archive is made available via a web server

➥ Start the server with a codebase, e.g:

➥ java -Djava.rmi.server.codebase="http://www.bsvs.de/

jars/HelloServer.jar" HelloServer

➥ the codebase property specifies the URL to the JVM underwhich the classes are to be loaded

➥ server passes codebase to RMI registry when registering theserver object

➥ RMI registry passes codebase to client

➥ Start of the client with specification of the policy file, e.g:

➥ java -Djava.security.policy=policy HelloClient

Page 162: Distributed Systems - Uni Siegen

3.4.1 Remote Class Loading in Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 151

Procedure for Transferring Objects

locally]loaded[Class

true

true

false

false

false

[Class in

true

false

codebase

Determineclass loader

object

ClassNotFoundException

object

Useloadedclass

true Load classfrom

codebase*

Send object Receive serialized object

Serialize

Send

[Codebaseavailable]

[Classavailablelocally]

RAM]

Loadclasslocally

Deserialize

* must be specifiedexplicitly as of

JDK 1.7

Page 163: Distributed Systems - Uni Siegen

3.4.2 Java Security Manager

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 152

➥ JVM can be equipped with a security manager if required

➥ for Java applications: System.setSecurityManager()

➥ for Java applets: by default

➥ Security manager checks, among other things, whether the

application is allowed to

➥ access a local file,

➥ establish a network connection,

➥ stop the JVM,

➥ create a class loader,

➥ read AWT events, ...

➥ Permissions are specified in a security policy

➥ if the specifications are violated: exception

Page 164: Distributed Systems - Uni Siegen

3.4.2 Java Security Manager ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 153

Security Policy

➥ Assigns permissions to codes from specific sources

➥ Code source can be described by two properties:

➥ code location: URL where the code was loaded from

➥ certificates (for signed code)

➥ Permissions allow access to certain resources

➥ permissions are modeled by objects, but are usually specified

in the policy file

➥ e.g. FilePermission p =

new FilePermission("/tmp/*", "read,write");

➥ or permission java.io.FilePermission "/tmp/*",

"read,write";

Page 165: Distributed Systems - Uni Siegen

3.4.2 Java Security Manager ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 154

Hierarchy of Permission Classes (JDK 1.2)

Permission

BasicPermission FilePermission SocketPermissionAllPermission

Property

Permission

Net

Permission

Reflected

Permission

Runtime

Permission

Logging

Permission

AWT

Permission

Security

Permission

Permission

Serializable

Permission

Auth

Audio

Permission

SQL

Permission

Use onlyfor testing!!!

Page 166: Distributed Systems - Uni Siegen

3.4.2 Java Security Manager ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 155

Policy File

grant {permission java.net.SocketPermission "www.bsvs.de:80","connect";

};

grant codebase "file:" {permission java.io.FilePermission "/home/tom/-","read, write";

permission java.io.FilePermission "/bin/*", "execute";};

grant codebase "http://www.bsvs.de/jars/HelloServer.jar" {permission java.net.SocketPermission "localhost:1024-","listen, accept, connect";

};

Page 167: Distributed Systems - Uni Siegen

3.4.2 Java Security Manager ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 156

Policy File ...

➥ All classes are allowed to:

➥ establish connections to www.bsvs.de, port 80

➥ Locally loaded classes may:

➥ read and write files in /home/tom or (recursively) a

subdirectory of it

➥ execute files in the /bin directory

➥ Classes loaded from http://www.bsvs.de/jars/

HelloServer.jar are allowed to:

➥ accept / establish network connections on / to the local

computer via non-privileged ports (1024 or higher)

Page 168: Distributed Systems - Uni Siegen

3.4.2 Java Security Manager ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 157

Further Documentation

➥ General information on policy files::

http://docs.oracle.com/javase/8/docs/technotes/guides/

security/PolicyFiles.html

➥ Overview of the permission classes::

http://docs.oracle.com/javase/8/docs/technotes/guides/

security/permissions.html

➥ Java API documentation::

http://docs.oracle.com/javase/8/docs/api/

Page 169: Distributed Systems - Uni Siegen

3 Distributed Programming with Java RMI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 158

3.5 Summary

➥ RMI allows access to remote objects

➥ transparent, via proxy objects

➥ proxy classes are generated automatically

➥ usually at runtime

➥ Parameter passing semantics

➥ by value, if parameter object can be serialized

➥ by reference, if parameter object is an RMI object

➥ Classes can also be loaded remotely (security manager!)

➥ Name service: RMI registry

➥ Security: RMI over SSL is possible, but not ideal

Page 170: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 159

Distributed SystemsSummer Term 2021

4 Name Services

Page 171: Distributed Systems - Uni Siegen

4 Name Services ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 160

Content

➥ Basics

➥ Example: JNDI

Literature

➥ Tanenbaum, van Steen: Ch. 4.1

➥ Farley, Crawford, Flanagan: Ch. 7

➥ http://docs.oracle.com/javase/tutorial/jndi/overview

Page 172: Distributed Systems - Uni Siegen

4.1 Basics

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 161

Names, Addresses and IDs

➥ Name: character or bit sequence that refers to a unit

➥ unit: e.g. computer, printer, file, user, website, ...

➥ Address: name of the entry point of a unit

➥ entry point allows access to the unit

➥ several entry points per unit are possible

➥ entry point may change over time

➥ A position-independent name identifies a unit independently

from its entry point

➥ ID: name with the following properties:

➥ ID refers to at most one unit, unit has at most one ID

➥ ID always refers to the same unit (not reused)

Page 173: Distributed Systems - Uni Siegen

4.1 Basics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 162

Namespaces

➥ represented by directed, labelled graph

"/home/steen/mbox"

"/keys""/home/steen/keys"elke

maxsteen

home keys

mbox.twmrc

n0

n1

n2 n3 n4

n5

keys

n6 n7

➥ leaf node: named unit,

with information / status

if required

➥ inner node: directory node

➥ edges are labeled with names

➥ Units are named by paths in the graph:

Start node: < Label-1, Label-2, ... >

➥ absolute path: starting from root (of namespace)

➥ relative path: starting from any node

➥ Example: names in the UNIX file system

Page 174: Distributed Systems - Uni Siegen

4.1 Basics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 163

Aliasing and Linking

➥ Alias: alternative name for the same unit

➥ Possibilities for the realization of aliases:

➥ allow several absolute pathnames for one unit

➥ e.g. hard link in Unix

➥ a (special) leaf node stores pathname of the unit

➥ e.g. symbolic link in Unix

➥ Transparent linking of different namespaces:

➥ a (special) directory node stores the ID of a directory node in

another namespace

➥ e.g. mounted file system in Unix

Page 175: Distributed Systems - Uni Siegen

4.1 Basics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 164

Name Resolution

➥ Finding the node (or information) that corresponds to a name

➥ start at the start node

➥ look up first label in directory table

⇒ ID of the next node

➥ etc., until the path is completely processed

➥ Conclusion mechanism: determination of the start node

➥ usually implicit

➥ Global names: resolution independent of specific context

➥ Local names: resolution is context-dependent

➥ e.g. pathname relative to working directory in Unix

Page 176: Distributed Systems - Uni Siegen

4.1 Basics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 165

Implementation of Naming Services

➥ Typical operations:

➥ bind(name, address, attributes)

➥ lookup(name, attributes) ⇒ address, attributes

➥ unbind(name, address)

➥ In distributed systems:

➥ namespace is stored distributed (usually hierarchically)

➥ for high availability: additionally replicated storage

➥ Name resolution can be iterative or recursive

➥ iterative: Server responds with address of next server

➥ recursive: server requests even at next server

➥ Example: Domain Name System (☞ RN I, 11.1)

Page 177: Distributed Systems - Uni Siegen

4.2 Example: JNDI

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 166

➥ JNDI: Java Naming and Directory Interface

➥ API for access to different name and directory services

➥ directory service also stores attributes of objects

RMI CORBA LDAP DNS

JNDI naming manager

Serviceprovider

JNDI SPI

JNDI API

RMI

Java application

registryCORBAnamingservice

LDAPserver server

DNS

Page 178: Distributed Systems - Uni Siegen

4.2 Example: JNDI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 167

➥ JDNI supports compound namespaces

➥ managed by various name or directory services

systemFile

Context(directory)

Initialcontext

LDAP

User objects

DNS

➥ Directories are called “contexts”

➥ objects are bound to names within a context

Page 179: Distributed Systems - Uni Siegen

4.2 Example: JNDI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 168

The Interface javax.naming.Context for Naming Contexts

➥ Important methods:

➥ bind(), rebind() : bind objects to names

➥ bind() throws exception if name already exists

➥ unbind() : remove names

➥ rename() : rename

➥ lookup() : resolve name to object

➥ listBindings() : list of all bindings

➥ createSubcontext() : create sub-context

➥ destroySubcontext() : delete sub-context

Page 180: Distributed Systems - Uni Siegen

4.2 Example: JNDI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 169

The Interface javax.naming.Context for Naming Contexts ...

➥ Implementation class InitialContext

➥ for initial context (depending on the concrete name service)

➥ Context iC = new InitialContext(properties);

➥ configuration via Properties object (Hashtable), among

others:

➥ "java.naming.factory.initial"

➥ factory for InitialContext

➥ "java.naming.provider.url"

➥ contact information for service provider

➥ "java.naming.security.principal" and

"java.naming.security.credentials"

➥ user name and password for authentication

Page 181: Distributed Systems - Uni Siegen

4.2 Example: JNDI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 170

Example: Accessing the RMI Registry

import javax.naming.*;...

Properties props = new Properties();props.put("java.naming.factory.initial","com.sun.jndi.rmi.registry.RegistryContextFactory");

props.put("java.naming.provider.url","rmi://localhost:1099");

Context ctx = new InitialContext(props);

obj = (Hello)ctx.lookup("Hello-Server");

message = obj.sayHello();

Page 182: Distributed Systems - Uni Siegen

4.2 Example: JNDI ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 171

Example: Accessing a Local File System

import javax.naming.*;...

Properties props = new Properties();props.put("java.naming.factory.initial",

"com.sun.jndi.fscontext.RefFSContextFactory");Context ctx = new InitialContext(props);

for (int i=0; i<args.length-1; i++)ctx = (Context)ctx.lookup(args[i]);

NamingEnumeration<Binding> list= ctx.listBindings(args[args.length-1]);

while (list.hasMore()) {Binding b = list.next();System.out.println(b.getName()+": "+b.getClassName());

}

Page 183: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 172

Distributed SystemsSummer Term 2021

5 Process Management

Page 184: Distributed Systems - Uni Siegen

5 Process Management ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 173

Contents

➥ Distributed process scheduling

➥ Code migration

Literature

➥ Tanenbaum, van Steen: Ch. 3

➥ Stallings: Ch. 14.1

Page 185: Distributed Systems - Uni Siegen

5 Process Management ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 174

5.1 Distributed Process Scheduling

➥ Typical: middleware component that

➥ decides on which node a process is executed

➥ and probably migrates processes between nodes

➥ Gloals:

➥ balance the load between nodes

➥ maximize the system performance (average response time)

➥ also: minimize the communication between nodes

➥ meet special hardware / resource requirements

➥ Load: typically the length of the process queue (ready queue)

➥ sometimes resource consumption and communication volume

are considered, too

Page 186: Distributed Systems - Uni Siegen

5.1 Distributed Process Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 175

Approaches to distributed scheduling

➥ Static scheduling

➥ mapping of processes to nodes is defined before execution

➥ NP-complete, therefore heuristic methods

➥ Dynamic load balancing, two variants:

➥ execution location of a process is defined during creation and

is not changed later

➥ execution location of a process can be changed at runtime

(several times, if necessary)

➥ preemptive dynamic load balancing, process migration

Page 187: Distributed Systems - Uni Siegen

5.1 Distributed Process Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 176

5.1.1 Static Scheduling

➥ Procedure dependent on the structure / the modelling of a job

➥ jobs always consist of several processes

➥ differences in communication structure

➥ Examples:

➥ communicating processes: graph partitioning

➥ non-communicating tasks with dependencies: list scheduling

Page 188: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 177

Scheduling through graph partitioning

➥ Given: process system with

➥ CPU / memory requirements

➥ specification of communication load

between each pair of processes

usually represented as a graph

64

81

43

2

3 23

551

2

4 2G

E

A B C

F

IH

D

➥ Wanted: partitioning of the graph in such a way that

➥ CPU and memory requirements are met for each node

➥ partitions are about the same size (load balancing)

➥ weighted sum of cut edges is minimal

➥ i.e. as little communication as possible between nodes

➥ NP-complete, therefore many heuristic procedures

Page 189: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 177

Scheduling through graph partitioning

➥ Given: process system with

➥ CPU / memory requirements

➥ specification of communication load

between each pair of processes

usually represented as a graph

= 30

1 2 3

64

81

43

2

3 23

551

2

4 2G

E

A B C

F

IH

D

➥ Wanted: partitioning of the graph in such a way that

➥ CPU and memory requirements are met for each node

➥ partitions are about the same size (load balancing)

➥ weighted sum of cut edges is minimal

➥ i.e. as little communication as possible between nodes

➥ NP-complete, therefore many heuristic procedures

Page 190: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 177

Scheduling through graph partitioning

➥ Given: process system with

➥ CPU / memory requirements

➥ specification of communication load

between each pair of processes

usually represented as a graph

= 28

1 2 3

64

81

43

2

3 23

551

2

4 2G

E

A B C

F

IH

D

➥ Wanted: partitioning of the graph in such a way that

➥ CPU and memory requirements are met for each node

➥ partitions are about the same size (load balancing)

➥ weighted sum of cut edges is minimal

➥ i.e. as little communication as possible between nodes

➥ NP-complete, therefore many heuristic procedures

Page 191: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 178

List scheduling

➥ Tasks with dependencies, but without communication during

execution

➥ tasks work on results of other tasks

D 6 E 6 F 4

G 4

A 6 B 5 C 4

1

211

333

41

Modelling

➥ program represented as a DAG

➥ nodes: tasks with execution times

➥ edges: communication with transfer

time

Page 192: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 179

Method

➥ Create prioritized list of all tasks

➥ many different heuristics to determine the priorities, e.g.

according to:

➥ length of the longest path (without communication) from the

node to the end of the DAG (High Level First with EstimatedTime, HLFET).

➥ earliest possible start time (Earliest Task First, ETF)

➥ Process the list as follows:

➥ assign the first task to the node that allows the earliest start

time

➥ remove the task from the list

➥ Creation and processing of the list can also be interleaved

Page 193: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

➥ Assumption: local communication does not cost any time

Page 194: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6+5+4 = 15Static level (without comm.):

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

➥ Assumption: local communication does not cost any time

Page 195: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6+5+4 = 15Static level (without comm.):

6+6+4 = 16

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

➥ Assumption: local communication does not cost any time

Page 196: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413

➥ Assumption: local communication does not cost any time

Page 197: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDEBCA

➥ Assumption: local communication does not cost any time

Page 198: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDEBCA

➥ Assumption: local communication does not cost any time

Page 199: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDEBC

➥ Assumption: local communication does not cost any time

Page 200: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDEB

➥ Assumption: local communication does not cost any time

Page 201: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDE

➥ Assumption: local communication does not cost any time

Page 202: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDE

➥ Assumption: local communication does not cost any time

Page 203: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDE

➥ Assumption: local communication does not cost any time

Page 204: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4 6E

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFDE

➥ Assumption: local communication does not cost any time

Page 205: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFD

➥ Assumption: local communication does not cost any time

Page 206: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E D 5

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFD

➥ Assumption: local communication does not cost any time

Page 207: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

D 5

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFD

➥ Assumption: local communication does not cost any time

Page 208: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

D 5

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GFD

➥ Assumption: local communication does not cost any time

Page 209: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

D 5

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: GF

➥ Assumption: local communication does not cost any time

Page 210: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

D 5

4F

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List: G

➥ Assumption: local communication does not cost any time

Page 211: Distributed Systems - Uni Siegen

5.1.1 Static Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 180

Example: List Scheduling with HLFET

6A

C 4

B 4

6E

D 5

4F

G 4

Schedule with 3 nodes:

0 2 4 6 8 10 12 14 16

1

2

3

A 6 C 4

D 5 F 4

G 4

E 6

B 4

2

1 3

1 1

41

5 3

16

4

9 10 8

1413 List:

➥ Assumption: local communication does not cost any time

Page 212: Distributed Systems - Uni Siegen

5.1 Distributed Process Scheduling ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 181

5.1.2 Dynamic Load Balancing

➥ Components of a load balancing system

➥ Information policy – when is load balancing triggered?

➥ on demand, periodically, in case of state changes, ...

➥ Transfer policy – under which condition is load shifted?

➥ often: decision with the help of threshold values

➥ Location policy – how is the receiver (or sender) found?

➥ polling of some nodes, broadcast, ...

➥ Selection policy – which tasks are moved?

➥ new tasks, long tasks, location-independent tasks, ...

Page 213: Distributed Systems - Uni Siegen

5.1.2 Dynamic Load Balancing ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 182

Typical approaches to dynamic load balancing

➥ Sender initiated load balancing

➥ new process usually start on the local node

➥ if node is overloaded: determine load of other nodes and startprocess on low-loaded node

➥ e.g. ask randomly selected nodes for their load, sendprocess if load ≤ threshold, otherwise: next node

➥ disadvantage: additional work for already overloaded node!

➥ Receiver initiated load balancing

➥ when scheduling a process: check whether the node has stillenough work (processes)

➥ if not: ask other nodes for work

➥ Similar also for preemptive dynamic load balancing

Page 214: Distributed Systems - Uni Siegen

5 Process Management ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 183

[Tanenbaum/Steen, 3.4]5.2 Code Migration

➥ In distributed systems, in addition to data also programs aretransfered between nodes

➥ partly also during their execution

➥ Motivation: performance and flexibility

➥ preemptive dynamic load balancing

➥ optimization of communication (move code to data or highly

interactive code to client)

➥ increased availability (migration before system maintenance)

➥ use of special HW or SW resources

➥ use / evacuation of unused workstation computers

➥ avoid code installation on client machines (dynamic loading of

code from server)

Page 215: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 184

Models for Code Migration

➥ Conceptual model: a process consists of three “segments”:

➥ code segment

➥ the executable program code of the process

➥ execution segment

➥ complete execution status of the process➥ virtual address space (data, heap, stack)➥ processor register (incl. instruction counter)

➥ process / thread control block

➥ resource segment

➥ contains references to external resources required by the

process➥ e.g. files, devices, other processes, mailboxes, ...

Page 216: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 185

Models for Code Migration ...

➥ Weak mobility

➥ only the code segment is transferred

➥ including initialization data if necessary

➥ program is always started from initial state

➥ examples: Java applets, loading remote classes in Java

➥ Strong mobility

➥ code and execution segment are transferred

➥ migration of a process in execution

➥ examples: process migration, agents

➥ Sender- or receiver-initiated migration

Page 217: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 186

Code Migration Issues and Solutions

➥ Security: target computer executes unknown code (e.g. applet)

➥ restricted environment (sandbox)

➥ signed code

➥ Heterogeneity: code and execution segment depend on CPU andoperating system

➥ use of virtual machines (e.g. JVM, XEN)

➥ migration points at which state can be stored and read in aportable way (possibly supported by compiler)

➥ Access to (local) resources

➥ remote access with a global reference

➥ move or copy the resource

➥ new binding to resource of the same type

Page 218: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 187

[Stallings, 14.1]Process migration

➥ Migration of a process that is already running

➥ triggered by OS or the process itself

➥ mostly for dynamic load balancing

➥ Sometimes combined with checkpoint /restart function

➥ instead of transferring the status of the process, it can also be

stored persistently

➥ Design goals of migration procedures:

➥ low communication effort

➥ only short blocking of the migrated process

➥ no dependency on source computer after migration

Page 219: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 188

Process Flow of a Process Migration

➥ Creating a new process on the target system

➥ Transfer the code and execution segment (process address

space, process control block), initialization of the target process

➥ required: identical CPU and OS or virtual machine

➥ Update all connections to other processes

➥ communication links, signals, ...

➥ during migration: buffering at source

➥ then: forwarding to target computer

➥ Delete the original process

➥ if necessary, retain a “shadow process” for redirected system

calls, e.g. file accesses

Page 220: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 189

Transferring the process address space

➥ Eager (all): transfer the entire address space

➥ no traces of the process remain on source nodes

➥ very expensive for large address space (especially if not all

pages are used)

➥ often together with checkpoint/restart function

➥ Precopy : process continues to run on source node during

transfer

➥ to minimize time in which the process is blocked

➥ pages modified while the migration is in progress must be sent

again

Page 221: Distributed Systems - Uni Siegen

5.2 Code Migration ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 190

Transferring the process address space ...

➥ Eager (dirty): transfer only modified pages that are in mainmemory

➥ all other pages are only transferred when accessed

➥ integration with virtual memory management

➥ motivation: quickly “flush” main memory of the source node

➥ source node may remain involved until the end of the process

➥ Copy-on-reference: transfer each page only when accessed

➥ variation of eager (dirty)

➥ lowest initial costs

➥ Flushing: move all pages to disk before migration

➥ after that: copy-on-reference

➥ advantage: main memory of the source node is relieved

Page 222: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 191

Distributed SystemsSummer Term 2021

6 Time and Global State

Page 223: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 192

➥ Synchronization of physical clocks

➥ Lamport’s happended before relation

➥ Logical clocks

➥ Global state

Literature

➥ Tanenbaum, van Steen: Kap. 5.1-5.3

➥ Colouris, Dollimore, Kindberg: Kap. 10

➥ Stallings: Kap 14.2

Page 224: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 193

What is the difference between a distributed system and asingle/multiprocessor system?

➥ Single or multiprocessor system:

➥ concurrent processes: pseudo-parallel by time sharing or

truely parallel

➥ global time: all events in the processes can be ordered

unambiguously in terms of time

➥ global state: at any time a unique state of the system can be

determined

➥ Distributed system

➥ true parallelism

➥ no global time

➥ no unique global state

Page 225: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 194

Concurrency vs. (true) parallelism

A B C Dsequential

A B C D A A AB BD D DCconcurrent

AB

CD

One time line, processes can be interrupted by others

parallel

Each node / process has its owntime line! Events in differentprocesses can truely happensimultaneously.

One time line, processes are not interrupted.

at any time: interleaved execution.

Example: 4 processes

Page 226: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 195

Global Time

➥ In a single/multiprocessor system

➥ each event can (at least theoretically) be assigned a unique

time stamp of the same local clock

➥ for multiprocessor systems: synchronization at the shared

memory

➥ In distributed systems:

➥ many local clocks (one per node)

➥ exact synchronization of clocks is (on principle!) not possible

➥ ⇒ the sequence of events on different nodes can not (always)

be determined uniquely

➥ (cf. special theory of relativity)

Page 227: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 196

An effect of distribution

➥ Preliminary remark: events in distributed systems

Process 1

Process 2 Time

receive the message

send a message local event

local events

➥ Scenario: two processes observe two other processes

Observer A

Observer B

Process 1

Process 2z

yx

z x y

x y z

Page 228: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 197

An effect of distribution ...

➥ The observers may see the events in different order!

➥ Problem e.g., if the observers are replicated databases and the

events are database updates

➥ replicas are no longer consistent!

➥ Even from time stamps of (local) clocks it is not possible to

determine the order of events in a meaningful way

➥ Hence, in such cases:

➥ events with timestamps of logical clocks (☞ 6.3)

➥ logical clocks allow conclusions to be made about causal order

Page 229: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 198

[Coulouris, 10.3]6.1 Synchronizing Physical Clocks

➥ Physical clock shows ’real’ time

➥ based on UTC (Universal Time Coordinated)

➥ Each computer has its own (physical) clock

➥ quartz oscillator with counter in HW and if necessary in SW

➥ Clocks usually differ from each other (offset)

➥ Offset changes over time: clock drift

➥ typ. 10−6 for quartz crystals, 10−13 for atomic clocks

➥ Goal of clock synchronization:

➥ keep the offset of the clocks under a given limit

➥ clock skew: maximum allowed deviation

Page 230: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 199

Cristian’s Method

➥ Assumption: A and B want to synchronize their clocks with eachother

➥ B can also be a time server (e.g. with GPS clock)

➥ Protocol:

1. A sendsrequest to B

A

B

t0

➥ A must take the

runtime of the

reply message

into account

➥ estimate: runtime

= half the round

trip time

= (t1− t0)/2

Page 231: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 199

Cristian’s Method

➥ Assumption: A and B want to synchronize their clocks with eachother

➥ B can also be a time server (e.g. with GPS clock)

➥ Protocol:

t1

(t)

t

2. B reads time t andsends it to A

1. A sendsrequest to B

A

B

t0

➥ A must take the

runtime of the

reply message

into account

➥ estimate: runtime

= half the round

trip time

= (t1− t0)/2

Page 232: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 199

Cristian’s Method

➥ Assumption: A and B want to synchronize their clocks with eachother

➥ B can also be a time server (e.g. with GPS clock)

➥ Protocol:

3. A sets its clockto t + (t1−t0)/2

t1

(t)

t

2. B reads time t andsends it to A

1. A sendsrequest to B

A

B

t0

➥ A must take the

runtime of the

reply message

into account

➥ estimate: runtime

= half the round

trip time

= (t1− t0)/2

Page 233: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 199

Cristian’s Method

➥ Assumption: A and B want to synchronize their clocks with eachother

➥ B can also be a time server (e.g. with GPS clock)

➥ Protocol:

(interrupt−)latency

t1

(t)

t

3. A sets its clockto t + (t1−t0)/2

2. B reads time t andsends it to A

1. A sendsrequest to B

A

B

t0

➥ A must take the

runtime of the

reply message

into account

➥ estimate: runtime

= half the round

trip time

= (t1− t0)/2

Page 234: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 199

Cristian’s Method

➥ Assumption: A and B want to synchronize their clocks with eachother

➥ B can also be a time server (e.g. with GPS clock)

➥ Protocol:

transit timedifferent

t1

(t)

t

3. A sets its clockto t + (t1−t0)/2

2. B reads time t andsends it to A

1. A sendsrequest to B

A

B

t0

➥ A must take the

runtime of the

reply message

into account

➥ estimate: runtime

= half the round

trip time

= (t1− t0)/2

Page 235: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 200

Cristian’s Method: Discussion

➥ Problem: runtimes of both messages may be different

➥ systematic differences (different paths / latencies)

➥ statistical fluctuations of the transit time

➥ Accuracy estimate, if minimum transit time (min) is known:

➥ B can have determined t at the earliest at time t0 + min, at

the latest at time t1 − min (measured with A’s clock)

➥ thus accuracy ± ((t1 − t0)/2 − min)

➥ To improve accuracy:

➥ execute the message exchange multiple times

➥ use the one with minimum round trip time

Page 236: Distributed Systems - Uni Siegen

6.1 Synchronizing Physical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 201

Changing the clock

➥ Turning back is problematic

➥ order / uniqueness of time stamps

➥ Non-monotonous “jumping” of the time also problematic

➥ Therefore: clock is generally adjusted slowly

➥ runs faster / slower, until clock skew has been compensated

Further protocols

➥ Berkeley algorithm: server calculates mean value of all clocks

➥ NTP (Network Time Protocol): hierarchy of time servers in the

Internet with periodic synchronization

➥ IEEE 1588: clock synchronization for automation systems

Page 237: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 202

[Coulouris, 10.4]6.2 Lamport’s Happened-Before Relation

➥ In two cases, the order of events can also be determined without

a global clock:

➥ if the events are in the same process, local clock is sufficient

➥ the sending of a message is always before its reception

➥ Definition of the happened-before causality relation → (causality

relation)

➥ if events a, b are in the same process i and ti(a) < ti(b)(ti: time stamp with i’s clock), then a → b

➥ if a is the sending of a message and b its receipt, then a → b

➥ if a → b and b → c, then also a → c (transitivity)

➥ a → b means, that b may causally depend on a

Page 238: Distributed Systems - Uni Siegen

6.2 Lamport’s Happened-Before Relation ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 203

Examples

Process 1

Process 2

Process 3

Process 4

hf

c

d

k

a

e

b

g

j

i

l

➥ Among others, we have here:

➥ b → i and a → h (events in the same process)

➥ c → d and e → f (sending / receiving a message)

➥ c → k and a → i (transitivity)

➥ g 6→ l and l 6→ g: l and g are concurrent (nebenlaufig)

Page 239: Distributed Systems - Uni Siegen

6 Time and Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 204

[Coulouris, 10.4]6.3 Logical Clocks

➥ Physical clocks cannot be synchronized exactly

➥ therefore: unsuitable for determining the order in which eventsoccurred

➥ Logical clocks

➥ refer to the causal order of events (happened-before relation)

➥ no fixed relationship to real time

➥ In the following:

➥ Lamport timestamps

➥ are consistent with the happened-before relation

➥ vector timestamps

➥ allow sorting of events according to causality (i.e.happened-before relation)

Page 240: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 205

Lamport Timestamps

➥ Lamport timestamps are natural numbers

➥ Each process i has a local counter Li, that is updated as follows:

➥ at (more precisely: before) each local event: Li = Li + 1

➥ in each message, the time stamp Li of the send event is also

sent

➥ at receipt of a message with time stamp t:Li = max(Li, t + 1)

➥ Lamport time stamps are consistent with the causality:

➥ a → b ⇒ L(a) < L(b), where L is the Lamport timestamp

in the respective process

➥ but the reversal does not apply!

Page 241: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 206

Lamport Timestamps: Example

3

1

1

2

1

1 21

Process 1

Process 2

Process 3

Process 4l

c

d f h

k

a

e

b

g

j

i

1

2 3 4

64

5

➥ Among others, we have here:

➥ c → k and L(c) < L(k)

➥ g 6→ j and L(g) 6< L(j)

➥ g 6→ l, but still L(g) < L(l)

Page 242: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 206

Lamport Timestamps: Example

3

1

1

2

1

1 21

Process 1

Process 2

Process 3

Process 4l

c

d f h

k

a

e

b

g

j

i

1

2 3 4

64

5L = max(2, 1+1)3

➥ Among others, we have here:

➥ c → k and L(c) < L(k)

➥ g 6→ j and L(g) 6< L(j)

➥ g 6→ l, but still L(g) < L(l)

Page 243: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 206

Lamport Timestamps: Example

3

1

1

2

1

1 21

Process 1

Process 2

Process 3

Process 4l

c

d f h

k

a

e

b

g

j

i

1

2 3 4

64

5L = max(3, 1+1)3

➥ Among others, we have here:

➥ c → k and L(c) < L(k)

➥ g 6→ j and L(g) 6< L(j)

➥ g 6→ l, but still L(g) < L(l)

Page 244: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 206

Lamport Timestamps: Example

3

1

1

2

1

1 21

Process 1

Process 2

Process 3

Process 4l

c

d f h

k

a

e

b

g

j

i

1

2 3 4

64

5

L = max(2, 4+1)1

➥ Among others, we have here:

➥ c → k and L(c) < L(k)

➥ g 6→ j and L(g) 6< L(j)

➥ g 6→ l, but still L(g) < L(l)

Page 245: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 206

Lamport Timestamps: Example

3

1

1

2

1

1 21

Process 1

Process 2

Process 3

Process 4l

c

d f h

k

a

e

b

g

j

i

1

2 3 4

64

5

➥ Among others, we have here:

➥ c → k and L(c) < L(k)

➥ g 6→ j and L(g) 6< L(j)

➥ g 6→ l, but still L(g) < L(l)

Page 246: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 207

Vector Timestamps

➥ Objective: timestamps that characterize causality

➥ a → b ⇔ V (a) < V (b), where V is the vector timestamp

in the respective process

➥ A vector clock in a system with N processes is a vector of Nintegers

➥ each process has its own vector Vi

➥ Vi[i]: number of events that have occurred so far in process i

➥ Vi[j], j 6= i: number of events in process j, of which i knows

➥ i.e. by which it could have been causally influenced

Page 247: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 208

Vector Timestamps ...

➥ Update of Vi in process i:

➥ before any local event: Vi[i] = Vi[i] + 1

➥ Vi is included in every message sent

➥ when receiving a message with timestamp t:Vi[j] = max(Vi[j], t[j]) for all j = 1, 2, . . . , N

➥ Comparison of vector timestamps:

➥ V = V ′ ⇔ V [j] = V ′[j] for all j = 1, 2, . . . , N

➥ V ≤ V ′ ⇔ V [j] ≤ V ′[j] for all j = 1, 2, . . . , N

➥ V < V ′ ⇔ V ≤ V ′ ∧ V 6= V ′

➥ the relation < defines a partial order

Page 248: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 209

Vector Timestamps: Example

Process 1

Process 2

Process 3

Process 4

f h

k

a

e

b

g

j

i

l

c

d

(1,0,0,0)

(0,0,1,0)

(0,0,0,1) (0,0,0,2) (0,0,0,3)

(0,1,2,0)

(0,1,3,1) (0,1,4,1)

(0,1,0,0) (0,2,0,0)

(2,1,4,1) (3,1,4,1)

➥ Among others, we have here:

➥ c → k and V (c) < V (k)

➥ g 6→ l and V (g) 6< V (l), as well as l 6→ g and V (l) 6< V (g)

➥ V (l) and V (g) not comparable ⇔ l and g concurrent

Page 249: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 209

Vector Timestamps: Example

Process 1

Process 2

Process 3

Process 4

f h

k

a

e

b

g

j

i

l

c

d

(1,0,0,0)

(0,0,1,0)

(0,0,0,1) (0,0,0,2) (0,0,0,3)

(0,1,2,0)

(0,1,3,1) (0,1,4,1)

(0,1,0,0) (0,2,0,0)

(2,1,4,1) (3,1,4,1)2V = max((0,0,2,0), (0,1,0,0))

➥ Among others, we have here:

➥ c → k and V (c) < V (k)

➥ g 6→ l and V (g) 6< V (l), as well as l 6→ g and V (l) 6< V (g)

➥ V (l) and V (g) not comparable ⇔ l and g concurrent

Page 250: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 209

Vector Timestamps: Example

Process 1

Process 2

Process 3

Process 4

f h

k

a

e

b

g

j

i

l

c

d

(1,0,0,0)

(0,0,1,0)

(0,0,0,1) (0,0,0,2) (0,0,0,3)

(0,1,2,0)

(0,1,3,1) (0,1,4,1)

(0,1,0,0) (0,2,0,0)

(2,1,4,1) (3,1,4,1)2V = max((0,1,3,0), (0,0,0,1))

➥ Among others, we have here:

➥ c → k and V (c) < V (k)

➥ g 6→ l and V (g) 6< V (l), as well as l 6→ g and V (l) 6< V (g)

➥ V (l) and V (g) not comparable ⇔ l and g concurrent

Page 251: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 209

Vector Timestamps: Example

Process 1

Process 2

Process 3

Process 4

f h

k

a

e

b

g

j

i

l

c

d

(1,0,0,0)

(0,0,1,0)

(0,0,0,1) (0,0,0,2) (0,0,0,3)

(0,1,2,0)

(0,1,3,1) (0,1,4,1)

(0,1,0,0) (0,2,0,0)

(2,1,4,1) (3,1,4,1)

0V = max((2,0,0,0), (0,1,4,1))

➥ Among others, we have here:

➥ c → k and V (c) < V (k)

➥ g 6→ l and V (g) 6< V (l), as well as l 6→ g and V (l) 6< V (g)

➥ V (l) and V (g) not comparable ⇔ l and g concurrent

Page 252: Distributed Systems - Uni Siegen

6.3 Logical Clocks ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 209

Vector Timestamps: Example

Process 1

Process 2

Process 3

Process 4

f h

k

a

e

b

g

j

i

l

c

d

(1,0,0,0)

(0,0,1,0)

(0,0,0,1) (0,0,0,2) (0,0,0,3)

(0,1,2,0)

(0,1,3,1) (0,1,4,1)

(0,1,0,0) (0,2,0,0)

(2,1,4,1) (3,1,4,1)

➥ Among others, we have here:

➥ c → k and V (c) < V (k)

➥ g 6→ l and V (g) 6< V (l), as well as l 6→ g and V (l) 6< V (g)

➥ V (l) and V (g) not comparable ⇔ l and g concurrent

Page 253: Distributed Systems - Uni Siegen

6.4 Global State

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 210

A Motivating Example

➥ Scenario: peer-to-peer application, processes send requests to

each other

➥ Question: when can the application terminate?

➥ Answer: when no process is processing a request

Page 254: Distributed Systems - Uni Siegen

6.4 Global State

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 210

A Motivating Example

➥ Scenario: peer-to-peer application, processes send requests to

each other

➥ Question: when can the application terminate?

➥ Wrong answer: when no process is processing a request

➥ reason: requests can still be on the way in messages!

idle idle

RequestProcess 1 Process 2

➥ Other applications: distributed garbage collection, distributed

deadlock detection, ...

Page 255: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 211

➥ How can we determine the overall state of a distributed process

system?

➥ naıvely: union of the states of all processes (wrong!)

➥ Two aspects have to be considered:

➥ messages that are still in transit

➥ must be included in the state

➥ lack of global time

➥ a global state at time t cannot be defined!

➥ process states always refer to local (and thus different)

times➥ question: condition on local times? ⇒ consistent cuts

Page 256: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 212

Consistent Cuts

➥ Objective: build a meaningful global state from local states (whichare not determined simultaneously)

➥ Processes are modeled by sequences of events:

Process 1

Process 2

Process 3

➥ Cut: consider a prefix of the event sequence in each process

➥ Consistent cut:

➥ if the cut contains the reception of a message, it also containsthe sending of this message

Page 257: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 212

Consistent Cuts

➥ Objective: build a meaningful global state from local states (whichare not determined simultaneously)

➥ Processes are modeled by sequences of events:

Process 1

Process 2

Process 3

Cut Cut Cut Cut

➥ Cut: consider a prefix of the event sequence in each process

➥ Consistent cut:

➥ if the cut contains the reception of a message, it also containsthe sending of this message

Page 258: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 212

Consistent Cuts

➥ Objective: build a meaningful global state from local states (whichare not determined simultaneously)

➥ Processes are modeled by sequences of events:

Process 1

Process 2

Process 3

Inconsistent cutConsistent cuts

➥ Cut: consider a prefix of the event sequence in each process

➥ Consistent cut:

➥ if the cut contains the reception of a message, it also containsthe sending of this message

Page 259: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 213

The Snapshot Algorithm of Chandy and Lamport

➥ Determines online a “snapshot” of the global state

➥ i.e.: a consistent cut

➥ The global state consists of:

➥ the local states of all processes

➥ the status of all communication connections

➥ i.e. the messages in transmission

➥ Assumptions / properties:

➥ reliable message channels with sequence retention

➥ process graph is strongly connected

➥ each process can trigger a snapshot at any time

➥ the processes are not blocked during the algorithm

Page 260: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 214

The Snapshot Algorithm of Chandy and Lamport ...

➥ When a process wants to initiate a snapshot:

➥ process first saves its local state

➥ then it sends a marker message over each outgoing channel

➥ When a process receives a marker message:

➥ if it has not yet saved its local state:

➥ it saves its local state

➥ and sends a marker over each outgoing channel

➥ else:

➥ for the channel where the marker was received, it saves all

messages that have been received since the local state

was saved➥ i.e., it records the status of the channel

Page 261: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 215

The Snapshot Algorithm of Chandy and Lamport ...

➥ The algorithm terminates when each process has received a

marker message on each channel

➥ the determined consistent section is then (initially) stored in a

distributed way

Page 262: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

e

b

dc

a

P2

P1

P3

Page 263: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

e

b

dc

a

P2

P1

P3

M

1. P1 initiates a snapshot, saves its state, and sends markers

M

Page 264: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

e

b

dc

a

P2

P1

P3

2. P3 receives a marker from P1, saves its state, and sends markers

M

MM

1. P1 initiates a snapshot, saves its state, and sends markers

Page 265: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

e

b

dc

P2

P1

P3

3. P2 receives and processes a2. P3 receives a marker from P1, saves its state, and sends markers

M

MM

1. P1 initiates a snapshot, saves its state, and sends markers

Page 266: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

e

b

dc

P2

P1

P3

P2 receives the marker from P1, saves its state, and sends markers3. P2 receives and processes a

M

M

2. P3 receives a marker from P1, saves its state, and sends markers

M

M

1. P1 initiates a snapshot, saves its state, and sends markers

Page 267: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

P2

P1

P3

4. P1, P2, P3 save the incoming messages, until all markers are received

e

c

b

d

P2 receives the marker from P1, saves its state, and sends markers3. P2 receives and processes a

M

M

2. P3 receives a marker from P1, saves its state, and sends markers

M

M

1. P1 initiates a snapshot, saves its state, and sends markers

Page 268: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 216

Example for the algorithmof Chandy/Lamport

P2

P1

P3

4. P1, P2, P3 save the incoming messages, until all markers are received

e

c

b

d

P2 receives the marker from P1, saves its state, and sends markers3. P2 receives and processes a2. P3 receives a marker from P1, saves its state, and sends markers1. P1 initiates a snapshot, saves its state, and sends markers

Page 269: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 217

Sequence in the Example and Selected Cut

P1

P2

P3

db

e

a

c

displayed initial state

➥ The cut consists of the local states of P1, P2, P3 and the

messages b, c, d, e

Page 270: Distributed Systems - Uni Siegen

6.4 Global State ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 217

Sequence in the Example and Selected Cut

P1

P2

P3

db

e

a

c

consistent cut determined by the algorithm

➥ The cut consists of the local states of P1, P2, P3 and the

messages b, c, d, e

Page 271: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 218

Distributed SystemsSummer Term 2021

7 Coordination

Page 272: Distributed Systems - Uni Siegen

7 Coordination ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 219

Contents

➥ Election algorithms

➥ Mutual exclusion

➥ Group communication (multicast)

➥ Transactions

Literature

➥ Tanenbaum, van Steen: Kap. 5.4-5.6

➥ Colouris, Dollimore, Kindberg: Kap. 11, 12

➥ Stallings: Kap 14.3

Page 273: Distributed Systems - Uni Siegen

7 Coordination ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 220

[Tanenbaum/Steen, 5.4]7.1 Election Algorithms

➥ In many distributed algorithms one arbitrary process must play anexceptional role

➥ e.g. central coordinator, initiator, ...

➥ Question: how to choose this process unambiguously?

➥ processes must be distinguishable, e.g. via a unique ID.

➥ then select e.g. the process with the highest ID

➥ Prerequisites / requirements:

➥ election can be initiated by multiple processes simultaneously

➥ e.g. after failure or recovery of a process

➥ after the election all processes must have the same result

➥ each process knows the IDs of all other processes, but doesnot know whether they are running or not

Page 274: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 221

The Bully Algorithm

➥ A process P holds an election as follows:

➥ P sends an ELECTION message to all processes with a larger

ID

➥ if none of the processes reacts, P wins the election

➥ if a process responds: P loses the election

➥ When a process receives an ELECTION message:

➥ (message comes from a process with a lower ID)

➥ return an OK message

➥ hold an election of your own

➥ At some point, there is only one process left

➥ this wins the election and sends the result to all others

Page 275: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 222

Bully Algorithm: Example

1

64

2 5

30

7

Previous Coordinatorhas crashed

Page 276: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 222

Bully Algorithm: Example

Process 4 holds an election1

64

2 5

30

7

Previous Coordinatorhas crashed

ELECTIONELECTION

ELECTION

Page 277: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 222

Bully Algorithm: Example

Processes 5 and 6 reply,Process 4 terminates its election

Process 4 holds an election1

64

2 5

30

7

Previous Coordinatorhas crashed

OK

OK

Page 278: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 222

Bully Algorithm: Example

Processes 5 and 6 simultaneouslyhold an election

Processes 5 and 6 reply,Process 4 terminates its election

Process 4 holds an election1

64

2 5

30

7

Previous Coordinatorhas crashed

ELECTION

ELE

CTI

ON

ELECTIO

N

Page 279: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 222

Bully Algorithm: Example

Process 6 replies to 5Process 5 terminates its election

Processes 5 and 6 simultaneouslyhold an election

Processes 5 and 6 reply,Process 4 terminates its election

Process 4 holds an election1

64

2 5

30

7

Previous Coordinatorhas crashed

OK

Page 280: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 222

Bully Algorithm: Example

Noone replied to the election ofprocess 6, thus, this process winsthe election and communicates theresult to all others

Process 6 replies to 5Process 5 terminates its election

Processes 5 and 6 simultaneouslyhold an election

Processes 5 and 6 reply,Process 4 terminates its election

Process 4 holds an election1

64

2 5

30

7

Previous Coordinatorhas crashed

COORDINATOR

Page 281: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 223

A Ring Algorithm

➥ Assumption: processes form a logical ring, i.e. each process

knows its successors in the ring

➥ Messages are sent along the ring as follows:

➥ a process tries to send the message to its direct successor

➥ if this process is not active, the message will be sent to the

next process in the ring, etc.

➥ ELECTION messages contain a list of process IDs

Page 282: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 224

A Ring Algorithm ...

➥ A process that initiates the election sends an ELECTION

message with its own ID along the ring

➥ When an ELECTION message is received by a process:

➥ if its own ID is not in the list of IDs:

➥ append the own ID to the list

➥ continue sending message along the ring

➥ else (message came back to the initiator):

➥ determine highest ID in the list

➥ send this ID in a COORDINATOR message along the ring

Page 283: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 225

Ring Algorithm: Example

3

51

2 4

60

7

Previous coordinatorhas crashed

Page 284: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 225

Ring Algorithm: Example

3

51

2 4

60

7

Previous coordinatorhas crashed

[2]

[5]

Processes 2 and 5 concurrently initiate an election

Page 285: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 225

Ring Algorithm: Example

3

51

2 4

60

7

Previous coordinatorhas crashed

no answer

[2,3][2]

[5]

Processes 2 and 5 concurrently initiate an election

Page 286: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 225

Ring Algorithm: Example

3

51

2 4

60

7

Previous coordinatorhas crashed

[5,6]

[2,3,4]

no answer

[2,3][2]

[5]

Processes 2 and 5 concurrently initiate an election

Page 287: Distributed Systems - Uni Siegen

7.1 Election Algorithms ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 225

Ring Algorithm: Example

3

51

2 4

60

7

Previous coordinatorhas crashed

[5,6,0] [2,3,4,5]Eventually both processes gettheir ELECTION messagesback and send a COORDINATOR message(with identical contents!)

[5,6]

[2,3,4]

no answer

[2,3][2]

[5]

Processes 2 and 5 concurrently initiate an election

Page 288: Distributed Systems - Uni Siegen

7 Coordination ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 226

7.2 Mutual Exclusion

➥ Here mainly: use / allocation of exclusive resources

➥ Requirements:

➥ safety: only one process can use the resource at any one time

➥ liveness: any process that requests the resource will

eventually get it

➥ fairness: access to resources in ’FIFO’ order

➥ Solution approaches:

➥ centralized server

➥ distributed algorithm with Lamport clock

➥ token ring algorithm

Page 289: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 227

Centralized server

➥ An special coordinator process manages the resource and a

queue for waiting processes

➥ determined e.g. via an election algorithm

➥ Resource is requested by sending a message to the coordinator

➥ if resource is free: coordinator answers with OK

➥ otherwise: coordinator does not answer

➥ requesting process is blocked (waiting for reply)

➥ Resource is released by sending a message to the coordinator

➥ if processes wait: coordinator sends an OK to one of them

➥ Problem: processes cannot detect failure of the coordinator

➥ this could be done using negative replies and polling

Page 290: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 228

A Distributed Algorithm (Ricart / Agrawala)

➥ Idea: a process that wants to have a resource asks all other

processes for their OK

➥ a process replies with OK, if

➥ it does not want the resource, or

➥ it wants the resource, but the other process has requested

it “earlier”

➥ Requires total order of request events

➥ order must be consistent with causality

➥ realizable e.g. via a time stamp (Lamport time, process ID)

with lexicographic order

➥ in the example of slide 206 this results in the event order:

b, c, a, e, g, d, j, f, l, h, i, k

Page 291: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 229

A Distributed Algorithm (Ricart / Agrawala) ...

➥ To request a resource, a process sends the following message to

all other processes:

➥ resource ID

➥ time stamp T of the request

➥ pair: (current Lamport time, own process ID)

(the message must be delivered reliably)

➥ The process then waits until it receives an OK message from all

other processes

➥ After that it can use the resource (exclusively)

Page 292: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 230

A Distributed Algorithm (Ricart / Agrawala) ...

➥ Each process responds to request messages as follows:

➥ resource is not used and not requested by the process:

➥ return OK message

➥ resource is used by the process:

➥ do not send a reply

➥ put the request in a queue

➥ Resource id not used, but requested by the process:

➥ if T (incoming message) < T (own request):

➥ return OK message

➥ or else:➥ do not send a reply

➥ put the request in a queue

Page 293: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 231

A Distributed Algorithm (Ricart / Agrawala) ...

➥ When a process releases the resource:

➥ send an OK message to all processes in the queue

➥ delete the queue

Page 294: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

32

1

the resourceBoth P1 and P3 want

Page 295: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

1. P1 sends request to all others

0Nok =Treq = 8,1

8,1 8,1

32

1

the resourceBoth P1 and P3 want

Page 296: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

2. P3 sends request to all others

1. P1 sends request to all others

0Treq = 12,3Nok =

0Nok =Treq = 8,1

12,3

12,3

8,1 8,1

32

1

the resourceBoth P1 and P3 want

Page 297: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

since it doesn,t want the resource3. P2 sends OK to P1 and P3,

2. P3 sends request to all others

1. P1 sends request to all others

1

1

Treq = 12,3Nok =

Nok =Treq = 8,1

OK

OK 12,3

8,1

32

1

the resourceBoth P1 and P3 want

Page 298: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

4. P1 doesn’t send an OK to P3,since (12,3) > (8,1).P1 adds P3 to its queue

since it doesn,t want the resource3. P2 sends OK to P1 and P3,

2. P3 sends request to all others

1. P1 sends request to all others

Queue: P3

1

1

Treq = 12,3Nok =

Nok =Treq = 8,1

8,1

32

1

the resourceBoth P1 and P3 want

Page 299: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

(8,1) < (12,3)5. P3 sends OK to P1, since

4. P1 doesn’t send an OK to P3,since (12,3) > (8,1).P1 adds P3 to its queue

since it doesn,t want the resource3. P2 sends OK to P1 and P3,

2. P3 sends request to all others

1. P1 sends request to all others

Queue: P32

1Treq = 12,3Nok =

Nok =Treq = 8,1

OK

32

1

the resourceBoth P1 and P3 want

Page 300: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

and uses the resource6. P1 received all OKs

(8,1) < (12,3)5. P3 sends OK to P1, since

4. P1 doesn’t send an OK to P3,since (12,3) > (8,1).P1 adds P3 to its queue

since it doesn,t want the resource3. P2 sends OK to P1 and P3,

2. P3 sends request to all others

1. P1 sends request to all others

Queue: P32

1Treq = 12,3Nok =

Nok =Treq = 8,1

P1 ownsthe resource

32

1

the resourceBoth P1 and P3 want

Page 301: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 232

Example for the Algorithm of Ricart / Agrawala

and sends an OK to P37. P1 releases the resource

and uses the resource6. P1 received all OKs

(8,1) < (12,3)5. P3 sends OK to P1, since

4. P1 doesn’t send an OK to P3,since (12,3) > (8,1).P1 adds P3 to its queue

since it doesn,t want the resource3. P2 sends OK to P1 and P3,

2. P3 sends request to all others

1. P1 sends request to all others

2Treq = 12,3Nok =

OK

the resourceP1 releases

32

1

the resourceBoth P1 and P3 want

Page 302: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 233

A Token Ring Algorithm

➥ The processes form a logical ring

➥ A token circles in the ring

➥ authorization for (exclusive) use of the resource

➥ token is initially generated by one of the processes

➥ On arrival of the token: process checks whether it wants theresource

➥ if so:

➥ use the resource➥ after releasing the resource:

➥ pass token to successor in the ring

➥ else:

➥ pass token immediately to successor in the ring

Page 303: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 234

Comparison of algorithms

➥ Centralized server:

➥ server is single point of failure and may be a performance

bottleneck

➥ clients cannot distinguish (without additional measures)

between server failure and occupied resource

➥ only little communication necessary

➥ Distributed algorithm:

➥ failure of any node is problematic

➥ any node can become a performance bottleneck

➥ high communication effort

➥ just a proof that a distributed, symmetrical algorithm is possible

Page 304: Distributed Systems - Uni Siegen

7.2 Mutual Exclusion ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 235

Comparison of algorithms ...

➥ Token ring algorithm:

➥ problem: loss of the token (detection, re-creation)

➥ failure of nodes is problematic

➥ communication, even if resource is not used

Algorithm Messages per Delay before Problems

allocation allocation

centralized 3 2 server failure

distributed 2(n − 1) 2(n − 1) failure of any process

token ring 1 ...∞ 0 ... n − 1 lost token,

failure of any process

Page 305: Distributed Systems - Uni Siegen

7 Coordination ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 236

[Coulouris, 4.5, 11.4]7.3 Group Communication (Multicast)

➥ In distributed systems, communication with a group of processes

(multicast) is often also important, e.g. for:

➥ fault tolerance based on replicated services

➥ service realized by group of servers

➥ all servers receive and process the requests

➥ finding of services (especially discovery / name services)

➥ multicast is a possible approach for this

➥ better performance through replicated data

➥ changes must be sent to all copies

➥ sending event notifications

➥ all subscribers receive the event

Page 306: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 237

Questions / Problems

➥ Addressing the recipients

➥ explicit list of all recipients

➥ addressing a process group

➥ static / dynamic groups

➥ Reliability

➥ reasonable guarantees that messages will reach their

recipients

➥ Order

➥ adequate guarantees as to the order in which multicast

messages arrive at the various recipients

Page 307: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 238

Reliability

➥ Unreliable multicast:

➥ some processes may not receive the message (e.g. due to

packet loss)

➥ Reliable multicast:

➥ apart from network and process failures, the message is

delivered to all processes in the group

➥ Atomic multicast:

➥ the message is (under all circumstances) received either by allprocesses of the group or by none of them

➥ required if all processes in the group must be kept consistent

(e.g., operations on replicated data)

Page 308: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 239

Order

➥ Unordered

➥ receiving order is undefined

and can be different in different

processes

➥ FIFO order

➥ messages from the same sender

are received by all processes in

FIFO order

➥ i.e. introduction of sequence

numbers local to the sender

1 2 3 4

1 2 3 4

Page 309: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 239

Order

➥ Unordered

➥ receiving order is undefined

and can be different in different

processes

➥ FIFO order

➥ messages from the same sender

are received by all processes in

FIFO order

➥ i.e. introduction of sequence

numbers local to the sender

1 2 3 4

1 2 3 4

Page 310: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 240

Order ...

➥ Causal order

➥ if message m′ can causally

depend on m (m → m′), then all

processes receive m before m′

➥ i.e. introduction of vector time

stamps

➥ Total order

➥ all messages are received by all

processes in the same order

➥ i.e. introduction of globalsequence numbers

1 2 3 4

1 2 3 4

Page 311: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 240

Order ...

➥ Causal order

➥ if message m′ can causally

depend on m (m → m′), then all

processes receive m before m′

➥ i.e. introduction of vector time

stamps

➥ Total order

➥ all messages are received by all

processes in the same order

➥ i.e. introduction of globalsequence numbers

1 2 3 4

1 2 3 4

Page 312: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 240

Order ...

➥ Causal order

➥ if message m′ can causally

depend on m (m → m′), then all

processes receive m before m′

➥ i.e. introduction of vector time

stamps

➥ Total order

➥ all messages are received by all

processes in the same order

➥ i.e. introduction of globalsequence numbers

1 2 3 4

1 2 3 4

Page 313: Distributed Systems - Uni Siegen

7.3 Group Communication (Multicast) ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 240

Order ...

➥ Causal order

➥ if message m′ can causally

depend on m (m → m′), then all

processes receive m before m′

➥ i.e. introduction of vector time

stamps

➥ Total order

➥ all messages are received by all

processes in the same order

➥ i.e. introduction of globalsequence numbers

1 2 3 4

1 2 3 4

Page 314: Distributed Systems - Uni Siegen

7 Coordination ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 241

7.4 Transactions

➥ Combining a sequence of atomic actions into a single unit

➥ atomic actions: read, change, write data

➥ Example: seat reservation

reserved

Transaction

Atomic actions

free seatsQuery list of

Choose a seatMark seat as

➥ Used not only in database systems

Page 315: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 242

Properties of Transactions: ACID

➥ Atomicity

➥ all-or-nothing principle: either all atomic actions are executed(correctly) or none at all

➥ Consistency

➥ a transaction always transfers a consistent state back toconsistent state

➥ Isolation

➥ concurrent transactions do not affect each other; the result isthe same as with sequential execution

➥ Durability

➥ at the (successful) end of the transaction all changes arestored permanently

Page 316: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 243

Atomicity

Rollback

BeginTransaction

Crash

BeginTransaction

Transaction

Commit

Transaction

All changes arestored permanently

are undoneAll changes

Page 317: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 244

Isolation

or

Two concurrent Permitted serializationstransactions

ZYX

CBACBA ZYX

ZYX CBA

➥ The result of the concurrent transactions corresponds to one of

the two serializations

Page 318: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 245

Isolation Levels

➥ Complete isolation of (database) transactions often is too

restrictive / too little performant

➥ Therefore: SQL99 standard defines four isolation levels

➥ Goal: avoidance of unwanted phenomena

➥ dirty reads: a transaction can read data of another

transaction before they have been committed

➥ unrepeatable reads: when reading repeatedly, a transaction

can see commited changes of other transactions

➥ phantom reads: when reading repeatedly, a transaction can

see that other transactions have added or deleted records

Page 319: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 246

Isolation Level According to ANSI/ISO-SQL99

Read Uncommitted

Read Committed

Repeatable Read

Serializable

PhantomReads

UnrepeatableReadsReads

DirtyPhenomenon

Isolationlevel

possible possible possible

possible

possible

possiblenot possible

not possible

not possible

not possible

not possible not possible

➥ Serializable corresponds to complete isolation

Page 320: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 247

Nested Transactions

➥ Within a transaction, several subtransactions take place

➥ Higher-level transaction can run successfully to completion, even

if subtransaction was terminated with an error

➥ Abort of the higher-level transaction results in aborting all

subtransactions

➥ Example: booking of flight and hotel

➥ booking of the flight should be maintained, even if hotel

booking (in the first attempt) fails

➥ Nested transaction are supported by only a few transaction

services

Page 321: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 248

Crash

Crash

Book flight

Flat transaction

Nested transactions

Begin Abort

Begin

Call subtransaction

Commit

Book hotel

Call subtransactionCall subtransaction

Book hotel

commit

Book flight

Abort

Book hotel

Begin BeginBegin Tentativecommit

Tentative

Page 322: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 249

Distributed Transactions

➥ So far: data is stored at exactly one location

➥ Distributed transactions: data is stored distributed

➥ Realization of transactions on the individual data resources

(databases) is no longer sufficient

➥ distributed transaction management becomes necessary

➥ There is a generally accepted Open Group model for the

management of distributed transactions

➥ is implemented by most transaction services

➥ most important feature: 2-Phase-Commit

Page 323: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

Application

Resource manager(RM)

Transaktion manager(TM)

Management of distributed transactionsacross resource boundaries

Transaction management within theindividual data resources

Page 324: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

1. Application requests start of a new transaction.TM internally initializes a new transaction.

Application

Resource manager(RM)

Transaktion manager(TM)

1begin

Page 325: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

2. Transaction is active.Application can use the resources.

Application

Resource manager(RM)

Transaktion manager(TM)

2

1begin

Page 326: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

3. Each RM used by the applilcationregisters with the TM for the transactio.n

Application

Resource manager(RM)

Transaktion manager(TM)

3 join

2

1begin

Page 327: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

4. Aplication requires to commit or abortthe transaction.

Application

Resource manager(RM)

Transaktion manager(TM)

commit /rollback

43 join

2

1begin

Page 328: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

5. TM requires RM to commit the changes:2−phase commit

Application

Resource manager(RM)

Transaktion manager(TM)

prepare5

commit /rollback

43 join

2

1begin

Page 329: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 250

Model for Managing Distributed Transactions

5. TM requires RM to commit the changes:2−phase commit

Application

Resource manager(RM)

Transaktion manager(TM)

commitprepare5

commit /rollback

43 join

2

1begin

Page 330: Distributed Systems - Uni Siegen

7.4 Transactions ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 251

2-Phase Commit

➥ Phase 1 (voting phase)

➥ TM asks all involved RM, if the commit would be successful

(“prepare”)

➥ each RM that answers “yes” prepares for the commit

➥ Phase 2 (finalization)

➥ if all RMs answered with “yes”:

➥ TM sends commit command to all RMs

➥ RM ultimately commits the data and sends an

acknowlegement to TM

➥ else:

➥ TM sends an abort command to all RMs

Page 331: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 252

Distributed SystemsSummer Term 2021

8 Replication and Consistency

Page 332: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 253

Contents

➥ Introduction, motivation

➥ Data-centered consistency models

➥ Client-centered consistency models

➥ Distribution protocols

➥ Consistency protocols

Literature

➥ Tanenbaum, van Steen: Kap. 6

Page 333: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 254

8.1 Introduction and Motivation

➥ Replication: several (identical) copies of data objects are stored

in the distributed system

➥ processes can access an arbitrary copy

➥ Reasons for the replication:

➥ increase in availability and reliability

➥ if a replica is not available, use another one

➥ reading multiple replicas with majority vote

➥ increase in read performance

➥ for large systems: concurrent read access can be serviced

by different replicas

➥ with systems spread over a large area: access request is

sent to a replica in the vicinity

Page 334: Distributed Systems - Uni Siegen

8.1 Introduction and Motivation ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 255

Central Problem of Replication: Consistency

➥ When data is changed, all replicas must be kept consistent

➥ Simplest option: all updates are done via totally ordered atomic

multicast

➥ high overhead when frequent updates occur

➥ in some replicas these may actually never be read

➥ totally ordered atomic multicast is very expensive with many /

widely dispersed replicas

➥ Strict consistency maintenance of replicas always deteriorates

performance and scalability

➥ Solution: weakened consistency requirements

➥ often only very weak demands, e.g. News, Web, ...

Page 335: Distributed Systems - Uni Siegen

8.1 Introduction and Motivation ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 256

Consistency Models

➥ A consistency model determines the order in which the write

operations (updates) of the processes are “seen” by the other

processes

➥ Intuitive expectation: a read operation always returns the result of

the last write operation (strict consistency)

➥ problem: there is no global time

➥ pointless to speak of the “last” write operation

➥ therefore: other consistency models necessary

➥ Data-centric consistency models: view of the data storage

➥ Client-centric consistency models: view of one process

➥ assumption: (essentially) no update by multiple processes

Page 336: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 257

8.2 Data Centric Consistency Models

➥ Model of a distributed data store:

Process(Client)

Process(Client)

Process(Client)

Distributed data storage

Local copy

Write andread accessees

➥ logical, shared data memory

➥ physically distributed and replicated across multiple nodes

Page 337: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 258

Sequential Consistency

➥ A data store is sequentially consistent if the result of eachprogram execution is as if:

➥ the (read/write) operations of all processes are executed in a(random) sequential order,

➥ in which the operations of each individual process appear inthe order specified by the program.

➥ P1 P2 Pn

Switch can beshifted arbitrarilyafter eachoperation

Operations inProgram order

Data store

I.e. the execution of theoperations of the individualprocesses can beinterleaved arbitrarily

➥ Independent of time orclocks

➥ All processes see the accesses in the same order

Page 338: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 259

Sequential Consistency: Examples

W(x)a

W(x)b

R(x)b

R(x)b

R(x)a

R(x)a

P1:

P4:

P3:

P2:

W(x)a

W(x)b

R(x)b R(x)a

P1:

P4:

P3:

P2:

R(x)bR(x)a

Allowed sequence: Forbidden Sequence:

➥ Notation:

➥ W(x)a : the value ’a’ is written into the variable ’x’

➥ R(x)a : variable ’x’ will be read, result is ’a’

➥ A possible sequential order of the left sequence:

➥ W2(x)b, R3(x)b, R4(x)b, W1(x)a, R3(x)a, R4(x)a

Page 339: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 259

Sequential Consistency: Examples

W(x)b

R(x)b

R(x)b

W(x)a

R(x)a

R(x)a

P1:

P4:

P3:

P2:

W(x)a

W(x)b

R(x)b R(x)a

P1:

P4:

P3:

P2:

R(x)bR(x)a

Allowed sequence: Forbidden Sequence:

➥ Notation:

➥ W(x)a : the value ’a’ is written into the variable ’x’

➥ R(x)a : variable ’x’ will be read, result is ’a’

➥ A possible sequential order of the left sequence:

➥ W2(x)b, R3(x)b, R4(x)b, W1(x)a, R3(x)a, R4(x)a

Page 340: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 260

Linearizability

➥ Stronger than sequential consistency

➥ Assumption: the nodes (processes) have synchronized clocks

➥ i.e. an approximation of a global time

➥ Operations have time stamps based on these clocks

➥ In comparison with sequential consistency additionally required:

➥ the sequential order of operations is consistent with theirtimestamps

➥ Complex implementation

➥ Used for formal verification of concurrent algorithms

Page 341: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 261

Causal Consistency

➥ Weakening of sequential consistency

➥ (Only) write operations that are potentially causally dependent

must be visible to all processes in the same order

P1:

P4:

P3:

P2:

Not causally consistent:

W(x)a

R(x)a W(x)b

R(x)b R(x)a

R(x)a R(x)b

P1:

P4:

P3:

P2:

Causally, but not seq. consistent:

W(x)a

R(x)a

R(x)a

R(x)a

W(x)b

W(x)c

R(x)c R(x)b

R(x)b R(x)c

Page 342: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 262

Weak Consistency

➥ In practice: access to shared resources is coordinated viasynchronization variables (SV)

➥ Then: weaker consistency requirements are sufficient:

➥ accesses to SVs are sequentially consistent

➥ an operation on a SV is not allowed until all previous writeaccesses to data have been completed everywhere

➥ no operation on data is allowed before all previous operationson SVs have been completed

P1: W(x)a S

Allowed event sequence: Invalid event sequence:

P1:

P2:

W(x)a W(x)b S

R(x)aS

W(x)b

R(x)b R(x)a SR(x)a R(x)b SP3:

P4:

P2: S R(x)b

Page 343: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 263

Release Consistency (Freigabe-Konsistenz)

➥ Idea as with weak consistency, but distinction between acquire

and release operations (mutual exclusion!)

➥ before an operation on the data is performed all acquire-

operations of the process must be completed

➥ before the end of a release operation all operations of the

process on the data must be completed

➥ acquire / release operations of a process are seen everywhere

in the same order

P1:

Allowed event sequence:

P2:P3:

W(x)bW(x)aacq(L) rel(L)acq(L) R(x)b rel(L)

R(x)a

Page 344: Distributed Systems - Uni Siegen

8.2 Data Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 264

Comparison of models

Strict Absolute time sequence of all shared accesses

(physically not useful!)

Linearization All processes see all accesses in the same order.

Accesses are sorted by a (non-unique) global

timestamp.

Sequential All processes see all accesses in the same order.

Accesses sre not sorted by time.

Causal All processes see causally linked accesses in the

same order.

Weak Data is only reliably consistent after a synchro-

nization has been performed.

Release Data is made consistent when leaving the critical

region.

Page 345: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 265

8.3 Client Centric Consistency Models

➥ In practice:

➥ clients are usually independent from each other

➥ changes to the data are mostly rare

➥ because of partitioning often no write/write conflicts

➥ e.g., DNS, WWW (Caches), ...

➥ Eventual consistency: all replicas will eventually becomeconsistent if no updates take place for a long time

➥ Problem if a client changes the replica it is accessing

➥ updates may not have arrived there yet

➥ client detects inconsistent behavior

➥ Solution: client-centric consistency models

➥ guarantee consistency for an individual client

➥ but not for concurrent accesses by multiple clients

Page 346: Distributed Systems - Uni Siegen

8.3 Client Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 266

Illustration of the problem

data base

replicated

Distributed and

to another replicaand (transparently) creates a connectionThe client moves to another location

Wide area network

Read and writeoperations

Mobile computer

Replicas must retainclient centric consistency

Page 347: Distributed Systems - Uni Siegen

8.3 Client Centric Consistency Models ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 267

Monotonic Read

➥ Example for a client centric consistency model

➥ more: see Tanenbaum / van Steen, Ch. 6.3

➥ Rule: When a process reads the value of a variable x, everysubsequent read operation for x returns the same or a morerecent value

➥ Example: access to a mailbox at different locations

WS(x )1

WS(x ;x )1 2

R(x )1

R(x )2

WS(x )1

WS(x )2

R(x )1

R(x )2 WS(x ;x )1 2

With monotonic read

L1:

L2:

L1:

L2:

Without monotonic read:

L1/L2: local copies

WS(...) set of write operations

Write operations to x in L1

are now executed on x in L2

Page 348: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 268

8.4 Distribution Protocols

➥ Question: where, when and by whom are replicas placed?

➥ permanent replicas

➥ server initiated replicas

➥ client initiated replicas

➥ Question: how are updates distributed (regardless of consistency

protocol, ☞ 8.5)?

➥ sending invalidations, status or operations

➥ pull or push protocols

➥ unicast or multicast

Page 349: Distributed Systems - Uni Siegen

8.4 Distribution Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 269

Placing the Replicas

Client initiatedreplicas

Server initiatedreplicas

Permanentreplicas

Server initiated replicas

Client initiated replicas

Clients

➥ All three types can occur simultaneously

Page 350: Distributed Systems - Uni Siegen

8.4 Distribution Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 270

Permanent Replicas

➥ Initial set of replicas, static, mostly small

➥ Examples:

➥ replicated web site (transparent to client)

➥ mirroring (Mirroring, client deliberately chooses a replica)

Server Initiated Replicas

➥ Server creates additional replicas on demand (Push-Cache)

➥ e.g., for web hosting services

➥ Difficult: deciding when and where replicas will be created

➥ usually access counter for each file, additional information

about the origin of the requests (→ nearest server)

Page 351: Distributed Systems - Uni Siegen

8.4 Distribution Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 271

Client initiated Replicas

➥ Other term: Client Cache

➥ Client cache locally stores (frequently) used data

➥ Goal: improving access time

➥ Management of the cache is completely left to the client

➥ server doesn’t care about consistency

➥ Data is usually kept in the cache for a limited time only

➥ prevents use of extremely obsolete data

➥ Cache usually placed on client machines, or shared cache for

multiple clients in their proximity

➥ e.g., Web proxy caches

Page 352: Distributed Systems - Uni Siegen

8.4 Distribution Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 272

Forwarding Updates: What’s Being Sent?

➥ The new value of the data object

➥ good with high read/update ratio

➥ The update operation (active replication)

➥ saves bandwidth (operation with parameters is usually small)

➥ but more computing power required

➥ Just a notification (invalidation protocols)

➥ notification makes the copy of the data object invalid

➥ on next access a new copy will be requested

➥ requires very little network bandwidth

➥ good at low read/update ratio

Page 353: Distributed Systems - Uni Siegen

8.4 Distribution Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 273

Pull and Push Protocols

➥ Push: updates are distributed on the initiative of the server thatmade the change

➥ replicas don’t have to request updates

➥ common in permanent and server-initiated replicas

➥ when a relatively high degree of consistency is required

➥ at high read/update ratio

➥ problem: server must know all replicas

➥ Pull: replicas actively request data updates

➥ common with client caches

➥ at low read/update ratio

➥ disadvantage: higher response time for cache access

➥ Leases: mixed form: first push for some time, then pull later

Page 354: Distributed Systems - Uni Siegen

8.4 Distribution Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 274

Unicast vs. Multicast

➥ Unicast: send update individually to each replica server

➥ Multicast: send one message and leave the distribution to the

network (e.g. IP multicast)

➥ often much more efficient

➥ especially in LANs: hardware broadcast possible

➥ Multicast is useful for push protocols

➥ Unicast is better with pull protocols

➥ only a single client/server requests an update

Page 355: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 275

8.5 Consistency Protocols

➥ Describe how replica servers coordinate with each other to

implement a specific consistency model

➥ Here specifically considered:

➥ consistency models that serialize operations globally

➥ e.g., sequential, weak and release consistency

➥ Two basic approaches:

➥ primary-based (primarbasierte) protocols

➥ write operations are always performed on a special copy

(primary copy)

➥ replicated-write protocols

➥ write operations go to multiple copies

Page 356: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 276

Primary-Based Protocols

➥ Read operations are possible on arbitrary (local) copies

➥ Write operations must be handled by the primary copy

➥ e.g., to realize a sequential consistency:

➥ the primary copy updates all other copies and waits foracknowledgements, only then it replies to the client

➥ problem: performance

➥ Remote-write protocols

➥ the writer forwards the operation to a fixed primary copy

➥ Local-write protocols

➥ writer must become primary copy before it can do the update

➥ i.e., the primary copy is migrated between servers

➥ good model also for mobile users

Page 357: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 277

Remote Write Protocol: Workflow (Sequential Consistency)

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

serverBackup

xx

Primaryserver for x

Client Client

Page 358: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 277

Remote Write Protocol: Workflow (Sequential Consistency)

(1) Write request

write(x)Data storage

read(x) val(x)

serverBackup

x

serverBackup

x

serverBackup

xx

Primaryserver for x

Client Client

Page 359: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 277

Remote Write Protocol: Workflow (Sequential Consistency)

write(x)

is forwarded to primary server(1) Write request

write(x)Data storage

read(x) val(x)

serverBackup

x

serverBackup

x

serverBackup

xx

Primaryserver for x

Client Client

Page 360: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 277

Remote Write Protocol: Workflow (Sequential Consistency)

update(x)update(x)

update(x)

(2) Primary server updates all backups

write(x)

is forwarded to primary server(1) Write request

write(x)Data storage

read(x) val(x)

serverBackup

x

serverBackup

x

serverBackup

xx

Primaryserver for x

Client Client

Page 361: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 277

Remote Write Protocol: Workflow (Sequential Consistency)

update ACK

update ACKupdate ACK

and waits for acknowledgements

update(x)update(x)

update(x)

(2) Primary server updates all backups

write(x)

is forwarded to primary server(1) Write request

write(x)Data storage

read(x) val(x)

serverBackup

x

serverBackup

x

serverBackup

xx

Primaryserver for x

Client Client

Page 362: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 277

Remote Write Protocol: Workflow (Sequential Consistency)

write ACK

(3) Acknowledge the end of the write operation

write ACKupdate ACK

update ACKupdate ACK

and waits for acknowledgements

update(x)update(x)

update(x)

(2) Primary server updates all backups

write(x)

is forwarded to primary server(1) Write request

write(x)Data storage

read(x) val(x)

serverBackup

x

serverBackup

x

serverBackup

xx

Primaryserver for x

Client Client

Page 363: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

x

Primaryserver for x server

Backup

x

Page 364: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

x

Primaryserver for x server

Backup

x

Page 365: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

requestprimary

Move primary copy to new server(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

x

Primaryserver for x server

Backup

x

Page 366: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

serverBackup

x x

Primaryserver for x

ACK

requestprimary

Move primary copy to new server(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

Page 367: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

(2) Acknowledge the end of the write operation

acq ACK

serverBackup

x x

Primaryserver for x

ACK

requestprimary

Move primary copy to new server(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

Page 368: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

write ACK

(3) Write operations are executed (only) on the local server

write(x)

(2) Acknowledge the end of the write operation

acq ACK

serverBackup

x x

Primaryserver for x

ACK

requestprimary

Move primary copy to new server(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

Page 369: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

relwrite ACK

(3) Write operations are executed (only) on the local server

write(x)

(2) Acknowledge the end of the write operation

acq ACK

update(x)update(x)

update(x)

(4) New primary server updates backups

serverBackup

x x

Primaryserver for x

ACK

requestprimary

Move primary copy to new server(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

Page 370: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 278

Local Write Protocol: Workflow (Release Consistency)

rel ACKrelwrite ACK

(3) Write operations are executed (only) on the local server

write(x)

(2) Acknowledge the end of the write operation

acq ACK

update ACKupdate ACK

update ACK

and waits for acknowledgements

update(x)update(x)

update(x)

(4) New primary server updates backups

serverBackup

x x

Primaryserver for x

ACK

requestprimary

Move primary copy to new server(1) Acquire lock;

acq

Data storageread(x) val(x)

serverBackup

x

serverBackup

x

ClientClient

Page 371: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 279

Replicated Write Protocols

➥ Allow execution of write operations on (multiple) arbitrary replicas

➥ In the following, two approaches:

➥ active replication

➥ update operations are passed on to all copies

➥ requirement: globally unique sequence of operations

➥ using totally ordered multicast

➥ or via central sequencer process

➥ quorum-based protocols

➥ only a portion of the replicas needs to be modified

➥ however, also multiple copies need to be read

Page 372: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 280

Problem With Replicated Object Calls

➥ What happens when a replicated object calls another?

Replicated object

the same call three timesthe method call

All replicas seethe same call

Client replicates Object C receives

A B2

B1

B3

C

➥ Solution: middelware that is aware of replication

➥ coordinator of B makes sure that only one call is sent to C and

its result is distributed to all replicas of B

Page 373: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 281

Quorum-based Protocols

➥ Clients need the permission of multiple servers for writing and for

reading

➥ When writing: send the request to (at least) NW copies

➥ their servers must agree to the change

➥ data gets a new version number when changed

➥ condition: NW > N/2 (N = total number of copies)

➥ prevents write/write conflicts

➥ When reading: send the request to (at least) NR copies

➥ client selects the latest version (highest version number)

➥ condition: NR + NW > N

➥ ensures that in any case the latest version is read

Page 374: Distributed Systems - Uni Siegen

8.5 Consistency Protocols ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 282

Quorum-based Protocols: Examples

(N < N/2)W

N = 6 N = 7 N = 7 N = 6 N = 1 N = 12R W R RW W

correct correctWrite/write conflictsare possible

Read quorum

Write quorum

E F G H

A B C D

I J K L

E F G H

A B C D

I J K L

E F G H

A B C D

I J K L

Page 375: Distributed Systems - Uni Siegen

8 Replication and Consistency ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 283

8.6 Summary

➥ Replication due to availability and performance

➥ Problem: consistency of copies

➥ strictest model: sequential consistency

➥ waekenings: causal consistency, weak ∼, release ∼

➥ client-centric consistency models

➥ Implementation of replication and consistency:

➥ replication scheme: static, server initiated, client initiated

➥ distribution protocols

➥ type of update, push / pull, unicast / multicast

➥ consistency protocols

➥ primary based / replicated write protocols

Page 376: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 284

Distributed SystemsSummer Term 2021

9 Distributed File Systems

Page 377: Distributed Systems - Uni Siegen

9 Distributed File Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 285

Contents

➥ General

➥ Case study: NFS

Literature

➥ Tanenbaum, van Steen: Ch. 10

➥ Colouris, Dollimore, Kindberg: Ch. 8

Page 378: Distributed Systems - Uni Siegen

9 Distributed File Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 286

[Coulouris, 8.1-8.3]9.1 General

➥ Objective: support the sharing of information (files) in an intranet

➥ in the Internet: WWW

➥ Allows applications to access remote files in the same way as

local files

➥ similar (or even better) performance and reliability

➥ Allows operation of diskless nodes

➥ Examples:

➥ NFS (standard in the UNIX area)

➥ AFS (goal: scalability), CIFS (Windows), CODA, xFS, ...

Page 379: Distributed Systems - Uni Siegen

9.1 General ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 287

Requirements

➥ Transparency: access, location, mobility, performance and scaling

transparency

➥ Concurrent file updates (e.g., locks)

➥ File replication (often: local caching)

➥ Heterogeneity of hardware and operating system

➥ Fault tolerance (especially in case of server failure)

➥ often: at-least-once semantics + idempotent operations

➥ advantageous: stateless server (easy reboot)

➥ Consistency (☞ 8)

➥ Security (access control, authentication, encryption)

➥ Efficiency

Page 380: Distributed Systems - Uni Siegen

9.1 General ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 288

Model Architecture of a Distributed File System

Server computer

Client module

Net−work

RPC

Directory service

Flat file service

interface

Client computer

Application

program

Application

program

➥ Tasks of the client module:

➥ emulation of the file interface of the local OS

➥ if necessary caching of files or file sections

Page 381: Distributed Systems - Uni Siegen

9.1 General ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 289

Model Architecture of a Distributed File System ...

➥ Flat file service:

➥ provides idempotent access operations to files

➥ e.g., read, write, create, remove, getAttributes, setAttributes

➥ no open / close, no implicit file pointer

➥ files are identified by UFIDs (Unique File IDs)

➥ (long) integer IDs, can serve as capabilities

➥ Directory service:

➥ maps file or path names to UFIDs

➥ if necessary first authenticates the client and verifies its

access rights

➥ services for creating, deleting and modifying directories

Page 382: Distributed Systems - Uni Siegen

9 Distributed File Systems ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 290

9.2 Case Study: NFS

➥ Introduced in 1984 by Sun

➥ Open, OS independent protocol

➥ Architecture:

Net−work

Client computer

Client

UNIX

NFSUNIXkernel NFS

server

Server computer

systemcalls

Virtual file system Virtual file system

systemsystemfile

UNIXfile

UNIX

Applicationprogram

Applicationprogram

Oth

erfil

e sy

s. NFSprotocol

Page 383: Distributed Systems - Uni Siegen

9.2 Case Study: NFS ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 291

Access Control and Authentication

➥ NFS server is stateless (up to and including NFS3)

➥ UFID (file handle): essentially just the file system ID and i-node

➥ not a capability

➥ Thus, access rights are checked with each request

➥ by the RPC protocol

➥ Authentication usually only via user and group ID

➥ extremely insecure!

➥ More possibilities in NFS3:

➥ Diffie-Hellman key exchange (insecure)

➥ Kerberos

➥ NFS4: secure RPC (RPCSEC GSS)

Page 384: Distributed Systems - Uni Siegen

9.2 Case Study: NFS ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 292

Mount Service

➥ An NFS file system can be mounted in the local directory tree

Server 2

/ (root)

users

nfs

ann joejim

ClientServer 1

/ (root) / (root)

export

people students x staff

vmunix usr

jonbob

remote

mount

remote

mount

➥ Collaboration of mount command in the client with the mount

service of the NFS server

➥ on request, the mount service provides file handles of the ex-

ported directories (for name resolution)

Page 385: Distributed Systems - Uni Siegen

9.2 Case Study: NFS ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 293

Translation of Pathnames

➥ Iteratively (NFS3): for each directory one request to NFS server

➥ necessary because path can cross mount points

➥ inefficiency is mitigated by client caching

Automounter

➥ Goal: set up an NFS mount only when it is accessed

➥ better fault tolerance, load balancing is possible

➥ Automounter is local NFS server

➥ thereby it sees the lookup()-requests of the client

➥ On request: set up the NFS mount and create a symbolic link tothe mount point

➥ After prolonged inactivity: release the mount

Page 386: Distributed Systems - Uni Siegen

9.2 Case Study: NFS ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 294

Server Caching

➥ Traditional file caching in UNIX:

➥ buffer in main memory for most recently used disk blocks

➥ read ahead : sequential blocks are loaded into cachebeforehand

➥ delayed write: modified blocks only written back when space isneeded; additionally every 30s by sync

➥ Server caching in NFS: two modes

➥ write through: write requests are executed in the server cacheand immediately also on disk

➥ advantage: no data loss in case of server crash

➥ delayed write: modified data will remain in the cache until acommit operation is executed (i.e. file is closed)

➥ advantage: better performance if many write operations

Page 387: Distributed Systems - Uni Siegen

9.2 Case Study: NFS ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 295

Client Caching

➥ NFS client buffers the results of (among other things) read / writeand lookup operations in a local cache

➥ leads to consistency issues, since now multiple copies

➥ Client is responsible for maintaining consistency

➥ Timeliness of the cache entry is checked with each access

➥ for that: compare whether the modification timestamp in thecache matches the modification timestamp on the server

➥ in case of negative validation: cache entry is deleted

➥ if validation is successful: cache entry is considered current fora certain time (3 - 30 s) without further checks

➥ i.e. changes only become visible after a few seconds

➥ compromise between consistency and efficiency

Page 388: Distributed Systems - Uni Siegen

9.2 Case Study: NFS ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 296

Client Caching ...

➥ Treatment of write operations:

➥ file block is marked as dirty in the cache

➥ marked blocks are sent asynchronously to the server:

➥ when closing the file

➥ at a sync operation on client machine

➥ possibly more often by block-input/output (bio)-demons

➥ Bio-demons realize asynchronous operations for read ahead and

delayed write

➥ for performance optimization

➥ NFS does not guarantee real consistency of client caches

Page 389: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 297

Distributed SystemsSummer Term 2021

10 Distributed Shared Memory

Page 390: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 298

Contents

➥ Introduction

➥ Design alternatives

Literature

➥ Colouris, Dollimore, Kindberg: Kap. 16.1-16.3

Page 391: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 299

➥ Goal: shared memory in distributed systems

➥ Basic technique considered here:

➥ page-based memory management on the nodes

➥ on demand: loading pages over the network

➥ if necessary replication of pages to increase performance

➥ Differentiation:

Applica−

Runtime

system

Hardware

tion

system

Operating

Applica−

Runtime

system

Hardware

tion

system

Operating

Applica−

Runtime

system

Hardware

tion

system

Operating

Applica−

Runtime

system

Hardware

tion

system

Operating

Applica−

Runtime

system

Hardware

tion

system

Operating

Applica−

Runtime

system

Hardware

tion

system

Operating

Hardware DSM: NUMA Shared Virtual Memory Middleware

Computer 1 Computer 2 Computer 1 Computer 1Computer 2 Computer 2

Page 392: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 300

Design alternatives

➥ Structure of the shared memory:

➥ byte-oriented (distributed shared memory pages)

➥ object-oriented (distributed shared objects)

➥ e.g., Orca

➥ immutable data (distributed shared container)

➥ operations: read, add, remove

➥ e.g., Linda Tuple Space, JavaSpaces

➥ Granularity (for page-based methods):

➥ when changing a byte: transmission of entire page

➥ with large pages: more efficient communication, less

administrative effort, more false sharing

Page 393: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 301

Design alternatives ...

➥ Consistency model: mostly sequential or release consistency

➥ Consistency protocol: usually local write protocol

➥ i.e., memory page migrated to accessing process

➥ with or without replication for read accesses

➥ client initiated replication, i.e., reader requests copy

➥ usually only one writer per page

➥ mostly invalidation protocols (with push model)

➥ update protocols only if write accesses can be buffered (e.g.

with release consistency)

Page 394: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 302

Design alternatives ...

➥ Management of copies

➥ mostly: at any time either multiple readers or one writer

➥ each page has an owner

➥ writer or one of the readers (last writer)

➥ manages a list of processes with copies of the page

➥ before write access: process requests current copy

➥ Finding the owner of a page:

➥ central manager

➥ manages owners, forwards requests

➥ fixed distribution

➥ fixed mapping: page → manager

Page 395: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 303

Design alternatives ...

➥ Finding the owner of a page ...:

➥ multicast instead of manager

➥ problem: concurrent requests

➥ solution: totally ordered multicast, vector time stamps

➥ dynamically distributed manager

➥ every process knows a likely owner

➥ this node forwards the request if necessary

➥ the likely owner is updated,

➥ when a process transfers the ownership property

➥ upon receipt of an invalidation message

➥ upon receipt of a requested page

➥ when a request is forwarded (to the requestor)

Page 396: Distributed Systems - Uni Siegen

10 Distributed Shared Memory ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 304

Design alternatives ...

➥ Problems: e.g., thrashing, especially due to false sharing

➥ simple remedy:

➥ a page can be migrated again only after a certain period of

time

➥ TreadMarks: multiple writer protocol

➥ release consistency; when released, only the changed

parts of the page are transferred

➥ changes are then “merged”

➥ in case of conflicts: result is non-deterministic

Page 397: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 305

Distributed SystemsSummer Term 2021

11 Fault Tolerance

Page 398: Distributed Systems - Uni Siegen

11 Fault Tolerance ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 306

Contents

➥ Introduction

➥ Process elasticity

➥ Reliable communication

➥ Recovery

Literature

➥ Tanenbaum, van Steen: Ch. 7

Page 399: Distributed Systems - Uni Siegen

11.1 Introduction

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 307

Concepts

➥ Failure: external incorrect behavior (system no longer keeps itspromises)

➥ Error: (unobserved) incorrect internal state

➥ Fault: physical defect (in HW or SW) causing the error

➥ fault can be transient, periodic or permanent

➥ Fault tolerance: system does not fail despite a fault

➥ Requirement for reliable systems:

➥ availability: p(system is working at time t)

➥ reliability: p(system is working in time interval ∆t)

➥ safety: no major damage if system fails

➥ maintainability: effort for “repair” after a failure

Page 400: Distributed Systems - Uni Siegen

11.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 308

Failure models

Crash failure Server halts

Omission failure Server is not responding to requests

Receive omission Server doesn’t receive incoming requests

Send omission Server doesn’t send messages

Timing failure Response time is outside the specification

Response failure Server’s response is incorrect

Value failure Only the value of the answer is wrong

State transition f. Incorrect control flow in server

Byzantine failure Random answers at arbitrary time

➥ Further distinction: can the client detect the failure or not?

Page 401: Distributed Systems - Uni Siegen

11.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 309

Failure masking through redundancy

➥ Fault tolerant system must hide faults from other processes

➥ Most important technique: redundancy

➥ information redundancy: additional “check bits” (e.g., CRC)

➥ time redundancy: repetition of faulty actions

➥ physical redundancy: important components are provided

multiple times

➥ Example: TMR, triple modular redundancy

➥ components are replicated three times

➥ majority decision for the results

➥ protects against failure of a replicated component

Page 402: Distributed Systems - Uni Siegen

11.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 310

Example for TMR

A1

A2

A3

C1

C2

C3B3

B2

B1

V2 V5 V8

A CB

Without redundancy

With TMR

Voter

Page 403: Distributed Systems - Uni Siegen

11.1 Introduction ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 310

Example for TMR

V1

V3

V4

V9

V7

V6

A1

A2

A3

C1

C2

C3B3

B2

B1

V2 V5 V8

A CB

Without redundancy

With TMR

Voter

Page 404: Distributed Systems - Uni Siegen

11.2 Process Elasticity

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 311

Objective: Protection Against Process Failure

➥ By replicating processes in groups

➥ message to the group is received by all members

➥ usually with totally ordered multicast

➥ Questions:

➥ organization of the groups?

➥ flat (symmetrical) vs. hierarchical (central coordinator)

➥ group administration, synchronous join / exit

➥ necessary number of replicas?

➥ k fault tolerant: failure of k processes can be tolerated

➥ for silent failures: ≥ k + 1 Processes

➥ for Byzantine failures: ≥ 2k + 1 processes

➥ agreement in faulty systems?

Page 405: Distributed Systems - Uni Siegen

11.2 Process Elasticity ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 312

Agreement in faulty systems

➥ Agreement is impossible with unreliable communication

➥ two army problem

➥ Agreement of faulty processes with reliable communication

➥ Byzantine agreement problem (byzantinische Generale)

➥ agreement only possible if > 2

3of the processes work correctly

1 2

4

1: (1, 2, x, 4)

2: (1, 2, y, 4)

3: (1, 2, 3, 4)

4: (1, 2, z, 4)

(i, j, k, l)

(1, 2, y ,4)

(1, 2, x ,4)

4 gets:

(1, 2, x ,4)

(e, f, g, h)

(1, 2, z ,4)

2 gets:

(1, 2, z ,4)

(a, b, c, d)

(1, 2, y ,4)

1 gets:

from 1:

from 2:

from 3:

from 4:

2

4

41

4x

y

z3

information

1

2

2

1. Send information 2. Received

1

3. Send received informationto all other processes

Page 406: Distributed Systems - Uni Siegen

11.3 Reliable Communication

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 313

Objective: Protection Against Communication Failures

➥ Point-to-point communication (☞ RN I)

➥ TCP masks omission failures, but not crash failures

➥ Client/server communication (☞ 2.1)

➥ possible failures:

➥ server not found➥ lost request

➥ server crash while processing the request

➥ lost reply

➥ client crash after sending the request

➥ Group communication (☞ 7.3)

➥ Distributed commit (☞ 7.4)

Page 407: Distributed Systems - Uni Siegen

11.4 Recovery

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 314

Objective: System Recovery After an Error

➥ Forward error recovery: go to a correct new state

➥ Backward error recovery: go to a correct earlier state

➥ i.e. reset to a consistent cut

➥ regular backup to stable storage (checkpointing)

➥ Independent checkpointing

➥ processes save their state independently of each other

➥ problem: domino effect

Initial stateP 1

P 2

1

2

3

4

5

6

Checkpoint

Error

Page 408: Distributed Systems - Uni Siegen

11.4 Recovery ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 315

➥ Coordinated checkpoints

➥ Chandy/Lamport algorithm (☞ 6.4

➥ alternatively: blocking 2 phase protocol

➥ problem: requires to reset all processes

➥ Local checkpoints with message logging

➥ goal: restore the crashed process to a state consistent with

the current state of the other processes

➥ reset to last checkpoint and restore the received messages

P

Q

R

Check−point

Crashm1

m2 m3

Restore

m2

m1

m3Duplicate

Page 409: Distributed Systems - Uni Siegen

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 316

Distributed SystemsSummer Term 2021

12 Summary, Important Topics

Page 410: Distributed Systems - Uni Siegen

12 Summary, Important Topics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 317

1. Introduction

➥ Definition of a distributed system

➥ Features / challenges of distributed systems

➥ Architecture models: client/server, n-tier

2. Middleware

➥ Tasks of the middleware

➥ Communication-oriented and application-oriented middleware

➥ Implementation of remote calls (proxy pattern)

3. Distributed Programming with Java RMI

➥ Approach to create an RMI application

➥ Programming of server and client

Page 411: Distributed Systems - Uni Siegen

12 Summary, Important Topics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 318

4. Name Services

5. Process Management

➥ Graph partitioning, list scheduling, code migration

6. Time and Global State

➥ Synchronization of physical clocks

➥ Lamport’s happended-before relation (causality relation)

➥ Lamport and vector clocks

➥ Consistent cuts, Chandy/Lamport algorithm

Page 412: Distributed Systems - Uni Siegen

12 Summary, Important Topics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 319

7. Coordination

➥ Election algorithms

➥ Mutual exclusion (centralized, Ricart/Agrawala, ring)

➥ Multicast (reliability, order)

➥ Transactions

8. Replication and Consistency

➥ Concept of consistency

➥ Sequential consistency, release consistency

➥ Consistency protocols (primary-based, quorum-based)

Page 413: Distributed Systems - Uni Siegen

12 Summary, Important Topics ...

Roland WismullerBetriebssysteme / verteilte Systeme Distributed Systems (1/13) 320

9. Distributed File Systems

10. Distributed Shared Memory

11. Fault Tolerance

➥ Failure models

➥ Physical redundancy, agreement

➥ Recovery