Chapter 18.2: Distributed Coordination. 18.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter

Chapter 18.2: Distributed CoordinationChapter 18.2: Distributed Coordination

18.2 Silberschatz, Galvin and Gagne ©2005Operating System Concepts

Chapter 18 Distributed CoordinationChapter 18 Distributed Coordination

Chapter 18.1

Event Ordering

Mutual Exclusion

Atomicity

Chapter 18.2

Concurrency Control

Deadlock Handling

Chapter 18.3

Deadlock Prevention

Election Algorithms

Reaching Agreement


Chapter ObjectivesChapter Objectives

To show how some currency-control schemes can be modified for use in a distributed environment

To present schemes for handling deadlock prevention, deadlock avoidance, and deadlock detection in a distributed system


Concurrency Control

Locking Protocols using replicated and non-replicated schemes

Time Stamping


Concurrency Control - BackgroundConcurrency Control - Background While concurrency issues from earlier chapters apply, the issue of distributed

control significantly complicates these ‘testy’ issues.

To begin with, we must recognize that ‘Transactions’ may be either local or global.

Clearly local considerations are simpler to deal with.

But with global transactions, we must modify the centralized concurrency schemes to accommodate the distribution of transactions

To put transactions into perspective:

A Local transaction only executes at one site

A Global transaction executes at several sites

As we’ve had in the past, we have Transaction Managers that are responsible for coordinating execution of transactions that access data at all local sites


Concurrency Control with Locking ProtocolsConcurrency Control with Locking Protocols

Can use a two-phase locking protocol that we’ve discussed before in a distributed environment by changing how the lock manager is implemented

We have two primary implmentation schemes:

Non-replicated schemes, and

Replicated schemes.

We will first discuss the non-replicated scheme.


Non-Replicated Schema This is the relatively easy case.

Here, each site maintains a local lock manager which administers lock and unlock requests for those data items stored in that site Simple implementation involves two message transfers for handling lock requests,

and a single message transfer for unlock requests

When a transaction wishes to lock some data item Q at some site S, it merely sends a message to the lock manager at S a request to a lock.

If site S can comply, the lock manager replies citing lock has been granted.

Clearly, if the lock cannot be granted, request is delayed until lock can be granted.

This is a simple scheme conceptually and implementation.

As stated, only two plus one message transfers are needed.

Deadlock can be much more complex since lock and unlock requests are not made at a single site.

Much more ahead on this one!

This is the only non-replicated scheme we will discuss here.


Replicated Schemes: Single-Coordinator ApproachReplicated Schemes: Single-Coordinator Approach

In this approach, a single lock manager resides in a single chosen site. All lock and unlock requests are made a that site

So, when some transaction needs a lock a data item, it sends a request to this single site, where the lock manager at this site determines if lock can be granted. If so, it sends a positive reply back to the site requesting the lock.

If not, the request is delayed, at which time a postive message is then transmitted to the requesting site.

Since we have replication, the requesting site can read data item from any site.

With a write operation, however, all sites where the replicas are found must be involved in the writing. issues.

(more)


Replicated Schemes: Single-Coordinator ApproachReplicated Schemes: Single-Coordinator Approach

Now, here’s the deal: On the plus side: We clearly have simple implementation and simple deadlock

handling

On the negative side: It is clear that there is a real possibility of bottleneck because all requests

must go through this central site.

Also, a biggee: a single central site is vulnerable to loss of concurrency controller if this single site fails

Compromise: Multiple-coordinator approach distributes lock-manager function over several sites Helps bottleneck problem but significantly impacts the deadlock problem

because both lock and unlock requests are not made at a single site!!

Much more ahead on a major section in this chapter addressing deadlock.


Replicated Schemes: the Majority ProtocolReplicated Schemes: the Majority Protocol

Similar to the non-replicated scheme, but here we have a lock manager at each site controlling locks for all data or data replicas stored at that site.

So, when some transaction needs to lock a data item replicated at several different sites, lock requests must be sent to more than one-half of the number of sites where the data item is stored.

Each lock manager must then determine if the lock can be granted.

The transaction cannot continue until a majority of locks are granted.

Clearly, if a lock cannot be granted at some site S, then positive responses will be delayed until the requests can be granted.

The Majority Protocol avoids drawbacks of central control by dealing with replicated data in a decentralized manner, but:

Much more complicated to implement, and

Deadlock-handling algorithms must clearly be modified.


Replicated Schemes: The Biased ProtocolReplicated Schemes: The Biased Protocol Similar to majority protocol, but

requests for shared locks prioritized over requests for exclusive locks

We have a lock manager at each site - manages all locks at that site – but differently:. Shared Locks:

When one site needs a lock for a data item, it simply requests a lock for that item from any site having that item.

Exclusive locks: Request for a lock must be sent to the lock manager for each site having a replica. As expected, permissions cannot be granted if a lock is tied up, and the granting is

delayed until the request can be honored. .

Net result is less overhead on read operations than in majority protocol; Would appear that a shared lock is sufficient here. But additional overhead on writes

Good approach where we anticipate a lot more reads than writes For exclusive locks, there must be a great deal of additional overhead.

Like majority protocol, deadlock handling is complex (later on deadlock)


Replicated Locks: The Primary CopyReplicated Locks: The Primary Copy Another approach sees one of the replicated sites as the ‘primary copy.’ This copy must reside at exactly one site. So, when a transaction needs a lock on a data item, it must request the lock from

this site. Naturally, if the request cannot be granted, positive response is delayed.

Concurrency control for replicated data handled in a manner similar to that of non-replicated data

Implementation is simple. But anytime there is any kind of centralized control, as there is here, we are always

confronted with the potential for failure at such a site, and in this case, such a failure makes the resource unavailable, even if it is available at other sites!

So the nice element here is that the implmentation is simple: there is only one specific site in ‘control’

Downside is the potential for site failure.


Concurrency Control with Time-StampingConcurrency Control with Time-Stamping This is a different approach for concurrency control.

Here each transaction is given a unique time stamp, used to sequence requests.

But generation of time stamps is done fairly and applied to the non-replicated environment.

Two approaches to generate unique time stamps:

centralized and

distributed


Concurrency Control with Time-StampingConcurrency Control with Time-Stamping

Centralized approach to time stamping:

There is a centralized site chosen to distribute the timestamps.

In this approach, the site can use

a logical counter or

its own internal local clock,

(since there are not any other clocks involved)

Distributed scheme to time stamping::

Each site generates a unique local timestamp

– As in the centralized approach, each site may use a logical counter or its own internal clock.

Here, we generate a global unique timestamp by concatenating unique local timestamp (of the site) with the unique site identifier

The order of the concatenation is important with the site identifier as least significant position to ensure that the global timestamps generated in one site are not always greater than those generated at another site.

This is shown in the next slide…


Generation of Unique TimestampsGeneration of Unique Timestamps

But if one site generates local timestamps at a faster rate than other sites, we have problems, as this affects the local counters.

All timestamps generated by a fast site could be larger than those generated by other sites.

So we need a system to ensure local timestamps are fairly generated across the system.

To do this, we define within each site - a logical clock, LC, that generates a local timestamp.

To ensure various logical clocks are in synch, we require a site to advance its logical clock whenever a transaction with a timestamp that visits that site has its logical clock value greater than the visited site’s LC. Here the site advances its logical clock up by one.


Generation of Unique TimestampsGeneration of Unique Timestamps

Then too, the system clock can be used to generate timestamps.If so, timestamps must still be assigned fairly

Since clocks may not be perfectly accurate, a technique similar to that used for logical clocks must be used to ensure that no clock gets far ahead or far behind another clock.


Deadlock Prevention


Deadlock Prevention (1 of 2)Deadlock Prevention (1 of 2) New topic! We speak of deadlock prevention, deadlock avoidance, and deadlock detection,

as we have done in the past in earlier chapters. This distributed environment presents new parameters, however…

Deadlock Prevention and Avoidance: Global Ordering of Resources: We can define a resource-ordering deadlock

prevention approach by defining a global ordering of resources. Resources allocated in an ‘order;’ system resources given unique numbers.

Resource-ordering deadlock-prevention algorithm – defines a global ordering among the system resources by using these unique numbers.

A process may request a resource with unique number i only if it is not holding a resource with a unique number grater than i

Simple to implement; requires little overhead

Banker’s algorithm – approach where every request goes through a banker. In this approach, one of the processes in the system is the process that

maintains the information necessary to carry out the Banker’s algorithm Also implemented easily, but may require too much overhead


Deadlock Prevention (2 of 2)Deadlock Prevention (2 of 2)

The global resource ordering scheme is easy to implement in a distributed system and requires little overhead.

Same for banker’s approach, but this one may require too much overhead, as the bank may become a bottleneck, since the number of messages to and from the banker may be large.

The banker’s scheme is not considered too practical in a distributed system.

Let’s consider a time-stamped approach to deadlock prevention with resource preemption.


Time-stamped Deadlock-Prevention SchemeTime-stamped Deadlock-Prevention Scheme

This approach is powerful, and for simplicity we will only consider the case of a single instance of each resource type…

In this approach, each process Pi is assigned a unique priority number

Priority numbers are used to decide whether a process Pi should wait for a process Pj; otherwise Pi is rolled back

This scheme prevents deadlocks: For every edge Pi Pj,, Pi has a higher priority than Pj

Thus a cycle cannot exist, but we can have starvation, as some processes with very low priorities may always be rolled back.

This difficulty can be avoided through the use of timestamps, which assigns a unique time-stamp to each process in the system when a process is created..

We have two complementary deadlock-prevention schemes using timestamps.

We will consider each of these:


Wait-Die SchemeWait-Die Scheme

Based on non-preemption.

If Pi requests a resource currently held by Pj, Pi is allowed to wait only if it has a smaller timestamp than does Pj (this implies that Pi is older than Pj)

Otherwise, Pi is rolled back (dies)

Example: Suppose that processes P1, P2, and P3 have timestamps 5, 10, and 15 respectively

if P1 request a resource held by P2, then P1 will wait

If P3 requests a resource held by P2, then P3 will be rolled back

Nice, but a process dies!

Then the next question becomes: “Is a rolled back process assigned a new timestamp?” (more ahead)


Wound-Wait SchemeWound-Wait Scheme

Based on a preemption.

If Pi requests a resource currently held by Pj, Pi is allowed to wait only if it has a larger timestamp than does Pj (Pi is younger than Pj). Otherwise,(if P(i) is older than Pj, it is rolled back (Pj is wounded by Pi)

Example: Suppose that processes P1, P2, and P3 have timestamps 5, 10, and 15 respectively

If P1 requests a resource held by P2, then the resource will be preempted from P2 and P2 will be rolled back

If P3 requests a resource held by P2, then P3 will wait


Appraisal of Wait-Die and Wound-Wait Schemes The assigning of new timestamps can screw things up.

So, both approaches may avoid starvation, which is easy to see, if when a process is rolled back (in either scheme) it is NOT assigned a new timestamp.

Because timestamps will always increase, rolled back processes will eventually have a very small timestamp and will not be rolled back again, and thus the process will not starve.

Let’s look more closely:


Appraisals of wait…die; wound…wait

In wait-die scheme an older process must wait for a younger one to release the needed resource. (the process is older)

The older this process gets, the more it tends to wait. But in the wound-wait scheme, the older process never waits for a younger

process. (here, the process is younger!)

In the wait-die scheme, if a process P(i) dies and is rolled back because it has requested a resource held by Pj , (here P(i) is younger) then the P(i) may reissue the same sequence of requests when it is restarted.

If the resource is still held by Pj then P(i) will die again, and so this process may die a number of times before getting the needed resources (and be rolled back again and again).

Compare this sequence of events to what happens in the wound-wait scheme, where when the P(i) is wounded and rolled back because Pj holds a needed resource (and P(i) is older); when P(i) is restarted and requests the resources held by Pj , P(i) waits because it will have a larer time-stamp (P(i) will be younger).

In this case, fewer rollbacks occur in the wound-wait scheme..

Clearly, there may be a number of rollbacks that are totally unnecessary, and this represents a serious degradation of performance.

End of Chapter 18.2End of Chapter 18.2

Documents

Chapter 18.2: Distributed Coordination. 18.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter