10
J. Parallel Distrib. Comput. 73 (2013) 1029–1038 Contents lists available at SciVerse ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Simple, space-efficient, and fairness improved FCFS mutual exclusion algorithms Alex A. Aravind Department of Computer Science, University of Northern British Columbia, Prince George, British Columbia, Canada highlights This paper proposes three new FCFS mutual exclusion algorithms. This paper proposes three simple FCFS mutual exclusion algorithms. This paper proposes a space optimal FCFS mutual exclusion algorithm. This paper proposes the first FCFS mutual exclusion algorithm using safe bits to assure a stronger fairness. article info Article history: Received 26 July 2011 Received in revised form 4 February 2013 Accepted 18 March 2013 Available online 2 April 2013 Keywords: Mutual exclusion Process/thread synchronization Concurrent programming Multicore processors Nonatomic algorithms FCFS fairness Space optimal abstract Let n be the number of threads that can compete for a shared resource R. The mutual exclusion problem involves coordinating these n concurrent threads in accessing R in a mutually exclusive way. This paper addresses two basic questions related to the First-Come-First-Served (FCFS) mutual exclusion algorithms that use only read–write operations: one is regarding the lower bound on the shared space requirement and the other is about fairness. The current best FCFS algorithm using read–write operations requires 5n shared bits. Could the shared space requirement be further reduced? The existing FCFS mutual exclusion algorithms assure fairness only among the threads which cross the ‘doorway’ sequentially. In systems with multicore processors, which are becoming increasingly common nowadays, threads can progress truly in parallel. Therefore, it is quite likely that several threads can cross the doorway concurrently. In such systems, the threads which cross the doorway sequentially may constitute only a fraction of all competing threads. While this fraction of threads follow the FCFS order, the rest of the threads have to rely on a biased scheme which always favors threads with smaller identifiers. Is there a simpler way to remove this bias to assure global fairness? This paper answers the above two questions affirmatively by presenting simple FCFS mutual exclusion algorithms using only read–write operations—the first one using 3n shared bits and the latter algorithms using 4n shared bits. The resulting algorithms are simple, space-efficient, and highly fair. © 2013 Elsevier Inc. All rights reserved. 1. Introduction The mutual exclusion problem is one of the most fundamen- tal problems in concurrent and distributed computing systems. Dijkstra presented the formulation of the mutual exclusion prob- lem and the first solution for n threads using atomic read and write operations [7]. Subsequently, Knuth pointed out that Dijk- stra’s solution is susceptible to starvation and hence presented the first starvation-free mutual exclusion algorithm [11]. Follow- ing Knuth’s work, several improvements were proposed later. All these improvements require read–write operations to be atomic and also must use a multiwriter shared variable. Most of the mutual E-mail address: [email protected]. exclusion algorithms proposed later required atomicity at read and write or at a higher level, and surveys can be found in [1,18,19,23]. About a decade later, Lamport published the bakery algorithm. This algorithm is considered as the first true solution to the mu- tual exclusion problem because it does not assume mutual ex- clusion at a lower level (referred to as nonatomicity property). The bakery algorithm is simple and has several attractive prop- erties, including First-Come-First-Served (FCFS), but requires un- bounded shared space. Nonatomic algorithms are theoretically interesting as they truly solve the mutual exclusion problem with- out pushing it down to a lower layer [12,13]. Also, the design and study of nonatomic algorithms, as they are so subtle, could reveal fundamental principles behind solving problems in dis- tributed computing. Such algorithms are also practically relevant now, because several recent systems such as smart-phones, multi- mode handsets, network processors, graphic chips, and other high 0743-7315/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jpdc.2013.03.009

Simple, space-efficient, and fairness improved FCFS mutual exclusion algorithms

  • Upload
    alex-a

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

J. Parallel Distrib. Comput. 73 (2013) 1029–1038

Contents lists available at SciVerse ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

Simple, space-efficient, and fairness improved FCFS mutualexclusion algorithmsAlex A. AravindDepartment of Computer Science, University of Northern British Columbia, Prince George, British Columbia, Canada

h i g h l i g h t s

• This paper proposes three new FCFS mutual exclusion algorithms.• This paper proposes three simple FCFS mutual exclusion algorithms.• This paper proposes a space optimal FCFS mutual exclusion algorithm.• This paper proposes the first FCFS mutual exclusion algorithm using safe bits to assure a stronger fairness.

a r t i c l e i n f o

Article history:Received 26 July 2011Received in revised form4 February 2013Accepted 18 March 2013Available online 2 April 2013

Keywords:Mutual exclusionProcess/thread synchronizationConcurrent programmingMulticore processorsNonatomic algorithmsFCFS fairnessSpace optimal

a b s t r a c t

Let n be the number of threads that can compete for a shared resource R. The mutual exclusion probleminvolves coordinating these n concurrent threads in accessing R in a mutually exclusive way. This paperaddresses two basic questions related to the First-Come-First-Served (FCFS) mutual exclusion algorithmsthat use only read–write operations: one is regarding the lower bound on the shared space requirementand the other is about fairness.

The current best FCFS algorithm using read–write operations requires 5n shared bits. Could the sharedspace requirement be further reduced?

The existing FCFSmutual exclusion algorithms assure fairness only among the threadswhich cross the‘doorway’ sequentially. In systems with multicore processors, which are becoming increasingly commonnowadays, threads can progress truly in parallel. Therefore, it is quite likely that several threads cancross the doorway concurrently. In such systems, the threads which cross the doorway sequentially mayconstitute only a fraction of all competing threads.While this fraction of threads follow the FCFS order, therest of the threads have to rely on a biased scheme which always favors threads with smaller identifiers.Is there a simpler way to remove this bias to assure global fairness?

This paper answers the above two questions affirmatively by presenting simple FCFSmutual exclusionalgorithms using only read–write operations—the first one using 3n shared bits and the latter algorithmsusing 4n shared bits. The resulting algorithms are simple, space-efficient, and highly fair.

© 2013 Elsevier Inc. All rights reserved.

1. Introduction

The mutual exclusion problem is one of the most fundamen-tal problems in concurrent and distributed computing systems.Dijkstra presented the formulation of the mutual exclusion prob-lem and the first solution for n threads using atomic read andwrite operations [7]. Subsequently, Knuth pointed out that Dijk-stra’s solution is susceptible to starvation and hence presentedthe first starvation-free mutual exclusion algorithm [11]. Follow-ing Knuth’s work, several improvements were proposed later. Allthese improvements require read–write operations to be atomicand alsomust use amultiwriter shared variable.Most of themutual

E-mail address: [email protected].

0743-7315/$ – see front matter© 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jpdc.2013.03.009

exclusion algorithms proposed later required atomicity at read andwrite or at a higher level, and surveys can be found in [1,18,19,23].

About a decade later, Lamport published the bakery algorithm.This algorithm is considered as the first true solution to the mu-tual exclusion problem because it does not assume mutual ex-clusion at a lower level (referred to as nonatomicity property).The bakery algorithm is simple and has several attractive prop-erties, including First-Come-First-Served (FCFS), but requires un-bounded shared space. Nonatomic algorithms are theoreticallyinteresting as they truly solve the mutual exclusion problemwith-out pushing it down to a lower layer [12,13]. Also, the designand study of nonatomic algorithms, as they are so subtle, couldreveal fundamental principles behind solving problems in dis-tributed computing. Such algorithms are also practically relevantnow, because several recent systems such as smart-phones, multi-mode handsets, network processors, graphic chips, and other high

1030 A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038

performance electronic devices usemultiportmemories that allownonatomic accesses through multiple ports. These developmentswere foreseen long ago, e.g., in [16, p. 58], with a long paragraphexplaining the application to VLSI chips with thousands of proces-sors (becoming reality now).

1.1. Motivation

Atomic access to shared memory is extremely useful, as theygenerally simplify the logic, and therefore can be used to designvarious algorithms for shared memory systems. But, when atom-icity is not provided in the system, they usually require a waitfreeimplementation at a lower level. Several such implementations ofatomic variables using nonatomic variables exist in the literature,but they all, in general, require many safe copies for each atomicvariable, and every read and write on an atomic variable requiresmany reads and writes at the lower level. That is, this approachgives a layered solution for nonatomic systems, but requires sig-nificantlymore space and time than the corresponding atomic one.Two nonatomic algorithms presented in this paper solve the mu-tual exclusion problem directly.

Use of bounded shared memory is important for practical ap-plications. If the bakery algorithm is executed on a 3 GHz machinewith 32 bits, then we might encounter overflow within a minute,and that may lead to an error. Since a nonatomic read can returnany possible value, including themaximumvalue, the overflow canoccur even faster. This issue may be practically irrelevant if usedwith 64 bits and read and write are atomic. In the future, even ifmost general purpose systems are expected to have 64 bit arith-metic and use atomic read–write operations, most embedded sys-tems may well remain be at 32 bits for the sake of size and energyconsumption, and some of themmay be even supported by multi-portmemories. Recent studies indicate that the trend in embeddedmulticore processors and systems is steadily increasing. Therefore,we believe, avoidance of overflow problem is still practically rele-vant and important.

Fairness is an important metric for many practical applications.A large number of algorithms in this context aim to achieve somelevel of FCFS fairness and an algorithm proposed in [3] assuresLeast-Recently-Used (LRU) fairness. Lamport introduced the no-tion of ‘doorway’ to define FCFS property. Based on this concept, ifthreads cross the doorway one after another then they are guaran-teed to access the shared resource in that order. With the systemsof multicore processors, which are becoming the norm nowadays,concurrent threads can execute in parallel and therefore it is quitelikely that most often they can cross the doorway concurrently. So,the doorway based ordering can assure fairness only to a fractionof threads which cross the doorway sequentially. The remainingthreads which cross the doorway concurrently use static priorityto determine their order of shared resource accesses. Specifically,priority is determined purely based on the thread ids that a threadwith lower id is favored over a thread with a higher id. This is un-desirable, as Dijkstra in his original formulation of the problem [7]apparently stated to avoid using static priority to access the sharedresource. Choosing one thread over the other based on some staticpriority may not be noticeable in one or two concurrent competi-tions and may not even be considered as an unfairness issue. But,between two threads i and j, if i is chosen over j whenever a tieoccurs, then it may be observed and considered unfair for thread j.

Until the algorithm by Lycklama and Hadzilacos [15] was pro-posed, achieving even doorway based FCFS in nonatomic algo-rithms using constant shared space was perceived as difficult. Inthat situation, seeking for anything more would have been evenharder. Now, as the issue of fairness among threads crossing thedoorway concurrently becomesmore visible due to the emergenceof multicore systems, it is highly desirable to address this issue forfuture systems, in addition to assuring doorway based FCFS fair-ness.

1.2. Contributions

This paper presents three simple and space efficient FCFS mu-tual exclusion algorithms.

• The first algorithm is atomic and uses only 3 shared bits perthread, and appears to be space optimal in its class.

• The second and third algorithms are nonatomic and they use4 shared bits per thread. In [15], the authors suggested a wayto obtain 4-bit algorithm by compromising the property ofcomposition (explained in Section 3). The composition prop-erty makes the algorithm and its proofs simpler. We obtain the4-bit algorithm without compromising the composition prop-erty, and hence simpler than their suggested version.

• The final 4-bit nonatomic algorithm, in addition to the doorwaybased FCFS, assures stronger fairness (i.e., fairness among thethreads which cross the doorway concurrently). This is the firstalgorithm using safe bits (the weakest memory model) thatsatisfies such a strong fairness property.

• The symmetry mechanism used to assure fairness amongconcurrent threads is generic and therefore can be used in otheralgorithms to assure fairness when such symmetry exists.

We also provide a comprehensive review, classification, andcomparison of existing FCFS mutual exclusion algorithms whichuse only read–write operations.

1.3. Organization

The remainder of the paper is organized as follows. The systemmodel and problem statement are provided in Section 2, and thegeneral structure of existing FCFS mutual exclusion algorithms isanalyzed in Section 3. Since our algorithms are derivedmainly fromthe FCFS algorithm by Lycklama and Hadzilacos, it is reviewed inSection 3.1. Sections 4 and 5, respectively, present 3-bit atomic and4-bit nonatomic FCFS algorithms. Then, the 4-bit FCFS algorithm isimproved to eliminate an inherent unfairness issue and hence toprovide high fairness in Section 6. A comparison analysis is givenin Section 7, and the paper is concluded in Section 8.

2. Systemmodel and problem statement

A sharedmemory is an abstraction of a collection of shared vari-ables that support two memory operations: read and write. In asimplest shared memory model, no two memory operations canoverlap in time and read always returns the value of the latestwrite. Modern systems are much more complex than this. In mul-tiport memories, operations on the same location can overlap intime and an overlapping operationmay return any arbitrary value.Based on the accuracy of the value a read can return, Lamporthas defined three types of shared variables called safe, regular, andatomic [13]. A shared memory with safe registers is the weakestmodel, and the algorithms using safe variables are notoriously sub-tle [15]. The algorithms that use atomic shared variables are calledatomic algorithms and the algorithms only use safe or regular vari-ables are called nonatomic algorithms.

The system consists of n threads (or processes), with ids 1, 2,. . . , n, and they communicate through shared memory usingread–write operations. We assume R as the shared resource thatrequires mutually exclusive access among the threads. We also as-sume that the memory model assures sequential consistency thatthe memory operations are executed as the order specified by theprogram. The mutual exclusion problem is to design an algorithmthat assures the following properties:

P1: At any time, at most one thread is allowed to access R (mutualexclusion property).

A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038 1031

P2: When one or more threads are interested in accessing R, oneof them eventually succeeds in accessing R (deadlock freedomproperty).

P3: Any thread interested in accessing R will be able to do so in finitetime (lockout freedom property).

The segment of code that accesses R is referred to as critical sec-tion (CS). Using CS terminology, the mutual exclusion property canbe redefined as follows: at any time at most one thread is allowedinside CS.

Deadlock freedom and lockout freedom are essential livenessproperties. Deadlock freedom assures that at no time all threadswill be blocked from progressing. Lockout freedom (also calledfreedom from starvation) assures that no individual thread will beblocked forever from progressing. These liveness properties maybe good enough from the system’s point of view. From an individ-ual thread’s point of view, a desirable liveness property is having abound on the number of possible bypasses.

P4: If a thread i issues a request to access R, then at most k otherthreads can access R before i accesses (k-bounded waiting).In the literature, 1-bounded waiting is referred to as linearwait.

For several practical applications, the higher level of fairness(such as FCFS and LRU) in accessing R is often the most desir-able. However, in this context, achieving absolute FCFS using onlyread–write operations is not possible.1 Therefore, the FCFS prop-erty in this context is defined based on a larger segment of codecalled doorway.

P5: If a thread i crosses the doorway before a thread j does, then iaccesses R before j accesses (FCFS property).

In the earlier works, and even some later works, what consti-tutes a doorway was not clearly defined. In one extreme, a door-way can be the first write on a shared variable. This definitionresults in absolute FCFS, which is unattainable. On the other ex-treme, the doorway can be the entire section of entry code. Thisdefinition does not make much sense. A reasonable definition ofdoorway was given by Lamport [12] and subsequently elaboratedby Peterson and Fischer [17] based on the concept ofwaitfree code.A waitfree code is a small portion of entry code that each threadwill pass through in a bounded number of steps. This is the mostwidely used definition of doorway.

The doorway-based FCFS assures fairness among the threadswhich cross the doorway sequentially. It does not state any re-quirement on the fairness among the threads which cross thedoorway concurrently. Here, we introduce a fairness property forthreads that cross the doorway concurrently.

P6: If threads i and j cross the doorway concurrently, then i and jmusthave equal probability in accessing R (Concurrent threads fairnessproperty).

Assuring a higher level fairness such as P6 could be an importantrequirement for most practical applications.

3. Literature review and general structure of FCFS algorithms

There are two approaches used in the literature to achieve FCFS.Lamport used the concept of token numbers to achieve the FCFSorder. Peterson and Fischer introduced a different concept called

1 Assume that two processes p and q start their competitions for R at global timestp and tq by writing to shared variables. Absolute FCFS requires either the values oftp and tq or their order. The global clock and the shared variable used to indicate thecompetition are distinct, and therefore access to them are independent and non-deterministic. So there is no way that the times or the order of writes to a sharedvariable can be derived by reading the global clock.

‘‘behind-of’’ list—an implicit queue to achieve the FCFS order. Themain drawback of Peterson–Fischer’s algorithmwas that a processmay have to wait indefinitely before even entering the doorway,which in a way defeats the spirit of FCFS. Later, eliminating thislimitation, Katseff designed a behind-of list using 2n2 shared bitsfor his FCFS algorithm [10], and used Eisenberg–McGuire’s mutualexclusion algorithm as a component. Since then, all other FCFSalgorithms used either one of these mechanisms—token numberor behind-of list. Hence, the FCFS mutual exclusion algorithms canbe broadly classified into token based and behind-of list based.Subsequent token based algorithms [2,4,9,22] are improvementsover Lamport’s bakery algorithm [12], and the behind-of listalgorithms [13,15] are improvements over Katseff’s algorithm [10].One of the key differences between these two classes of algorithmsis that token based algorithms solve the problem directly, whereasbehind-of list based algorithms use another mutual exclusionalgorithm as a component. Our algorithms in this paper belong tothe class of behind-of list based algorithms.

Behind-of list based algorithms have three logical components:the doorway, the FCFS resolution, and the mutual exclusion (ME)components. In [13], Lamport improved Katseff’s idea and usedhis own 1-bit mutual exclusion algorithm as the ME componentthat reduced the shared bits requirement to n2

+ n bits. (The1-bitmutual exclusion algorithmwas also independently inventedby Burns [6] and therefore is called Burns–Lamport’s algorithm.We refer to it in this paper as the BL algorithm.) Building andusing the behind-of list in these algorithms was an intricatetask. Subsequently, Lycklama and Hadzilacos came up with aninteresting and simpler way of encoding and using the behind-oflist in private variables. That reduced the shared space requirementto just 5n bits. Our algorithms are further improvements over theLycklama–Hadzilacos algorithm and we refer to it in this paper asthe LH algorithm.

An interesting aspect of the LH algorithm is its modular way ofcomposing an FCFS algorithm using two independent algorithms—one assuring mutual exclusion and the other assuring FCFS prop-erty, as illustrated in Fig. 1. Here, A1 assures mutual exclusion, A2assures doorway based FCFS (it does not assure mutual exclusionamong concurrent threads), and their simple unionA3 assures bothmutual exclusion and doorway based FCFS. Based on this observa-tion and using A1, A2, and A3 of Fig. 1, the following compositiontheorem was derived and then used to simplify the proof of cor-rectness of the algorithm.

Theorem 3.1 (Composition Theorem). If A1 satisfies mutual exclu-sion and deadlock freedom, A2 satisfies FCFS and deadlock freedom,and A1 and A2 use a disjoint set of shared variables, then A3 satisfiesmutual exclusion, deadlock freedom, lockout freedom, and FCFS prop-erties [15].

3.1. The LH algorithm

The LH algorithmuses an array T of pairs of binary bits to designFCFS component. In the doorway, each thread builds its behind-oflist in a private array S by reading the T values of all other threads.Then, it inverts a binary bit in its own T . Every thread inverts itsT bit alternatively in the consecutive CS requests, which results inthe sequence of repeating the pattern of either ‘‘00, 01, 11, 10’’ (0,1, 3, 2) or ‘‘00, 10, 11, 01’’ (0, 2, 3, 1) depending onwhich bit it startssetting with. A binary array D is used to indicate the entry and exitto the doorway, and a binary array V is used to indicate the entryand exit to the arbitration section of the code. The LH algorithm isgiven in Fig. 2.

In the LH algorithm, the lines from 1 to 5 constitute the door-way, line 7 resolves FCFS order, and the lines from 8 to 11 consti-tute the BL algorithm—the ME component. In the BL algorithm, a

1032 A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038

Fig. 1. Embedded algorithm structure [15].

Fig. 2. Lycklama and Hadzilacos algorithm (LH).

thread interested in accessing the CS first indicates its interest bysetting a Boolean bit X to 1. Then, it has to pass through two stagesto enter the CS. In the first stage, the thread checks whether anyother higher priority thread is competing. If yes, then it withdrawsits competition by resetting X to 0 andwaits until the higher prior-ity thread leaves the competition. When that happens, the threadresumes its competition by setting X back to 1. The process is re-peated until no other higher priority thread is competing. When itfinds that no other higher priority thread is competing, the threadproceeds to cross the second stage. Here, it waits for all lowerpriority threads to either withdraw from competing or leave thecompetition after completing their CS executions. Subsequently, itproceeds to enter the CS.

3.1.1. Analysis and observationIf a thread i crossed the doorway before another thread j entered

the doorway, then the FCFS property requires that j should notcross line 7 as long as i is in the range 5–10. For this paper, we referto this region as the FCFS unsafe region.

In doorway based FCFS, competing threads enter the CS in theorder of their doorway crossings. In the LH algorithm, this orderis determined using the T values of the threads. After indicatingthe entry to the doorway, every thread reads the T values of allother threads, sets its own T value, and then checks to see whetherthe value of others has changed or not. To indicate the change of Tvalue, a threadmust write a value to T other than its current value.The LH algorithm uses 4 distinct values for this purpose. The al-gorithm does not attach any meaning to these values except thatthey are distinct and repeat at length 4. The order also does notmatter. Particularly, from the description of the LH algorithm andits proof, it is not clear whether these four values are indeed nec-essary. Bringing some clarity on the values of T – what values arerequired and how they are used – might allow us to improve thealgorithm further in terms of reducing the shared space require-ment. This also would shed more clarity on the logic, and hencesimplify the resulting algorithm and its proof.

To see how many distinct values that T requires as minimalin the LH algorithm, let us describe a scenario illustrating themaximumnumber of times a thread i can change its T value duringa competition of another thread j.

Scenario 1. Consider that a thread i is in the doorway, say at line3, and another thread j has just crossed line 7, for their respectiveturns, say ti and tj, of the CS accesses. Now, when i is still at line3, j finishes its current CS execution, comes back quickly, for theturn tj + 1, and reads T [i]. Then, both i and j cross the doorway asconcurrent threads. Next, j progresses fast and bypasses i, entersthe CS, and leaves the competition. Now, j again comes back, forthe turn tj + 2, and i is still in its turn ti. This time jwill be blockedat line 7, at least until i finishes its CS execution for the turn ti andleaves the competition.

Note: Although the algorithmassures doorway based FCFS, afteri observes j at a line beyond 7, j can execute the CS two times andchange its T value two times before i gets its chance to execute theCS.

We are interested in knowing how many values are indeednecessary for T . During the executions in Scenario 1, the threadi must have read its S[j] value from j’s turn tj. Now, j is at theturn tj + 2. If the T [j] value of the turns tj and tj + 2 are same,then that will lead to a deadlock. This means at least three distinctvalues are needed to avoid deadlock. Adding more threads to theabove scenario could only constrain the progress of j and hence webelieve that 3 values are sufficient for the algorithms that allowScenario 1. So, among the 4 possible values of T , we can use onevalue for some other purpose.

Now, let us look at what a thread needs to check to assurethe FCFS property. For the FCFS order resolution, a thread needsto check for the FCFS order with another thread only if thatthread is in the FCFS unsafe region. When another thread is in theunsafe region, three cases are possible: that the thread crossed thedoorway before, concurrently, or after. So, we can use the T value0 to indicate that the thread is not in the FCFS unsafe region andother three nonzero T values for the FCFS order resolution. This, inaddition to increasing the clarity of the logic, eliminates the need ofthe V variable in the LH algorithm. This is the basis of the algorithmpresented next.

Note: Although the order of using the nonzero T values is notimportant, we use it explicitly in the proposed algorithm in theorder 1, 2, 3, 1, 2, 3, . . . to indicate the consecutive competitionsfor the CS. This interpretationmakes the assignment of the T valueexplicit, and hence reduces uncertainty later in proofs.

A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038 1033

Fig. 3. Atomic FCFS algorithm (A-FCFS).

4. Space efficient atomic FCFS algorithm

As explained before, the idea for the improvement of the algo-rithm is simple that a thread uses only a nonzero T value insidethe FCFS unsafe region and uses 0 to indicate that it is not in theunsafe region. This change makes the shared bit V used in the LHalgorithm unnecessary and therefore it can be eliminated. Now, assuggested in [15], we reuse D in place of X . This modification cre-ates redundantwaits that each threadhas towait twice forD valuesof other threads to become 0, first at line 7 and again at line 9 or10.We eliminate thewait at line 7. Putting these changes together,we get a simple 3-bit atomic FCFS algorithm (A-FCFS) and it worksas follows.

A competing thread enters the doorway by setting D to 1, readsthe T values of all the other threads into its local array S, sets its Tvalue to a nonzero value between 1 and 3, and exits the doorway.Then, it checks for the FCFS order by examining the S and T values.When it finds an unchanged nonzero S value, it waits until eitherthe thread leaves the FCFS unsafe region or comes back again foranother turn by changing its T value. In any case, the T value willbe changed either to 0 or to another nonzero value. When a threadcrosses the FCFS unsafe region, it resets its T value to 0. The T valuesrepeat in the sequence 0, 1, 0, 2, 0, 3, . . . . The formal code is givenin Fig. 3.

In the A-FCFS algorithm, the lines from 1 to 3 constitute thedoorway and the lines from 3 to 8 constitute the FCFS unsaferegion.

4.1. Correctness

We use some notations from the theory of nonatomic opera-tion (introduced by Lamport [13] and fine tuned later by severalresearchers) to facilitate the proofs.

• R and W : elementary operations read and write on sharedmemory.

• S: a set of elementary operations on the shared memory.• → and 99K: relations on elements of S, where a → b iff a pre-

cedes b (i.e., a ends before b begins), and a 99K b iff a can causallyaffect b (i.e., a starts no later than b ends).

• (S, →, 99K): a system of executions satisfying the following ax-ioms.(A1) → is an irreflexive partial order.(A2) If a → b then a 99K b.(A3) a → b iff b 99K a.(A4) If a → b 99K c or a 99K b → c then a 99K c.(A5) If a → b 99K c → d then a → d.(A6) {e : e 99K a} is finite.(A7) If a and b belong to same process then either a → b or

b → a.• i.l—thread i is executing at line l.• i.[l,m]—thread i is executing at a line between l and m, includ-

ing l and m.• i.l[R(x, v)]—thread i reads the value v from the variable x at

line l.• i.l[W (x, v)]—thread iwrites the value v to the variable x at line l.• s(e) and f (e), respectively start and finish of the operation e.

Theorem 4.1. The A-FCFS algorithm assures mutual exclusion.

Proof. Since the algorithm uses the BL algorithm as its componentto assure mutual exclusion, it is enough to prove that setting andresetting of D value earlier at lines 1 and 4 do not affect the safetyproperty of the algorithm. SettingD to 1 at line 1might cause somethread at lines 7 and 8 to wait for D to be reset to 0 (which willhappen eventually at line 4), but it does not enable any thread(which is not enabled otherwise) to cross lines 7 and 8 to enterthe CS. If a thread resets its D to 0 at line 4, then it is equivalentto that the thread is withdrawn from competing for the CS at thatmoment. �

Theorem 4.2. The A-FCFS algorithm assures the FCFS property.

Proof. If a thread i has crossed line 3 before another thread j hasstarted executing at line 1, then j will observe i’s latest nonzeroT value and therefore j will be blocked at line 5 at least until T [i]changes to 0 at line 9. Hence, j cannot bypass i to enter the CS. �

Assertion 4.3. In the A-FCFS, if a thread i is blocked at line 5 byanother thread j (i.e., S[j] = 0 ∧ S[j] = T [j] is true for iat line 5), then: (a) j.[3, 8] (i.e., j is in the FCFS unsafe region);and (b) i.2[R(T [j])] must return the value from j’s latest writej.3[W (T [j])].

Proof. The value of T [j] is 0 when the thread j is outside the FCFSunsafe region (i.e., j.[1, 2] or j.[9, 10]). This implies that S[j] =

0∧ S[j] = T [j] is false for any other thread i at line 5 and thereforeit cannot be blocked by j at line 5. This proves part (a) of theassertion. It is enough to prove that i.2[R(T [j])] could not be fromthe writes of any previous CS competitions of j. Let the sequenceof i’s competitions be . . . C4 → C3 → C2 → C1, and C1 isthe current competition. Without loss of generality, assume thevalue of T [j] during C1 as 3, during C2 as 2, during C3 as 1, duringC4 as 3, and so on. Since i is blocked by j at line 5 during C1, thevalue of S[j] and T [j] must be 3. So, imust have read its value fromeither C1, C4, C7, . . . Suppose i has obtained its S[j] value fromC4. Then, i must have reached line 4 before the end of C4 or atleast at the end of C3. Then, any later thread, including j duringC1, must observe the correct nonzero T [i] value and therefore, byTheorem 4.2, j must wait at line 5 until i completes its CS access.This is a contradiction that j has completed C3 and C2 and backagain for the competition C1, but i still is blocked by j at line 5. Thisimplies that i.2[R(T [j])] must be from the latest j.3[W (T [j])]. �

1034 A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038

Corollary 4.4. In the A-FCFS, if a thread i is blocked by anotherthread j at line 5 and i.2[R] and j.3[W ] are the latest read and write,respectively, of i and j then j.3[W ] → i.2[R].

Assertion 4.5. In the A-FCFS, let t1, . . . , tm be m threads such that,for 1 ≤ k < m, the thread tk is blocked by the thread tk+1 at line 5.Then, for 1 < j ≤ m, i < j, no thread tj can be blocked by a thread tiat line 5.

Proof. By the Corollary 4.4, for 1 < j ≤ m, tj.3[W ] → tj−1.2[R]. ByA7, for 1 ≤ j ≤ m, tj.2[R] → stj.3[W ]. Putting these together wehave, for 1 < j ≤ m, tj.2[R] → tj.3[W ] → tj−1.2[R] → tj−1.3[W ].From this we get tj.2[R] → ti.3[W ], for 1 < j ≤ m, i < j. That is,for 1 < j ≤ m, i < j, the thread tj has completed its read on T [i]before the thread iwrote on it. So, after the write ti.3[W ], T [i] willbe changed, and therefore (S[i] = T [i]) is true for j > i. Hence, for1 < j ≤ m, i < j, no thread tj can be blocked by a thread ti at line5. �

Corollary 4.6. In the A-FCFS, no cyclic wait of threads at line 5 ispossible.

Assertion 4.7. In the A-FCFS, no thread will be blocked forever atline 5.

Proof. Assume that m threads are blocked at line 5. By Theo-rem 4.2, none of these threads have to wait for a new threadcrossing the doorway later. By the Corollary 4.6, no cyclic wait ispossible. That means a subset of these m threads will be waitingonly for the threads which are beyond line 5. Since the BL algo-rithm assures liveness, these threads will complete their CS exe-cutions and leave the competition one by one, enabling a subsetof threads waiting at line 5 to progress. Repeating this argument,eventually allm threads will cross line 5. �

Assertion 4.8. In the A-FCFS, no thread will be blocked forever atline 8.

Proof. If a thread i at line 8 is waiting for a thread j at a line before8, then j must have a priority lower than that of i, and therefore itwill eventually reset its D value and be blocked either at line 5 or 7.Now, jwill not wait for i any more for j at line 8. If a thread i at line8 is waiting for a thread j beyond line 8, then jwill complete the CSexecution in a finite time and leave the competition. If j comes back,it will be blocked at line 5. So, no thread will be blocked forever atline 8. �

Assertion 4.9. In the A-FCFS algorithm, no thread will be blockedforever at line 7.

Proof. By the Assertion 4.8, no thread will be blocked forever atline 8. The highest priority thread cannot be blocked by a lowerpriority thread at line 7. When a thread comes for another term,then it will be blocked at line 5. Let a thread i be blocked at line7 and there are m threads in the system that have priority higherthan that of i. So, by the Assertions 4.7 and 4.8, all these higherpriority threads will complete their CS executions one by one in afinite time, finally enabling i to cross line 7. �

Theorem 4.10. The A-FCFS assures deadlock and lockout freedom.

Proof. Follows from the Assertions 4.7–4.9. �

The algorithmA-FCFS fails to assure deadlock freedom if T is notatomic as explained in the following scenario.

Scenario 2. Consider that a thread i is at line 2 and another threadj is at line 9, say with the T [j] value 1, and now resetting it to0. If T [j] is nonatomic, an overlapping read can return any value.Now, i obtains its S[j] as 2 from an overlapping reading on T [j] atline 2. Then, i crosses the doorway and then j comes back quicklycrossing the doorwaywith the T [j] value 2. Now, i cannot cross line5 because S[j] = T [j] = 2, and j also cannot cross line 5 becauseS[i] = T [i]. That is, the threads i and j wait for each other and it isa deadlock situation.

We do not know how to avoid the above scenario only using 3shared bits per thread when T is nonatomic. Our experience indi-cates that 4 bits per thread is required for nonatomic case.

In [15], the authors have suggested a way to get 4-bit FCFS al-gorithm by reusing the D value in place of the X value. This changecompromises the composition property and hence complicates thelogic and the proof of the algorithm. Our objective is to design a4-bit algorithm without compromising the composition propertyin order to keep it simple. Such an algorithm is presented next.

5. Nonatomic FCFS algorithm

In the next two sections, we present two FCFS algorithms with-out the requirement of atomicity on read andwrite.More precisely,these proposed algorithms work with safe bits—the weakest formof shared memory. That is, an overlapping read can return any ar-bitrary value from the domain of that shared variable.

The nonatomic algorithm we present next has three maingoals: (i) keep the logic as simple as that of A-FCFS algorithm;(ii) avoid possible bypasses described in Scenario 1; and (iii) keepit amenable for easy modification to assure high fairness. To keepthe logic simple, we use three values of T : 01 or 10 (i.e., 1 and 2)when the threads are in the FCFS unsafe region, and 0 otherwise.That is, the sequence of T values is 0, 1, 0, 2, 0, 1, . . . . The use ofonly two nonzero values avoids the possible overtake described inScenario 1 (which used 3 nonzero values). Finally, the symmetry of10 and 01will be exploited in the next algorithm to assure fairnessamong concurrent threads. Another main difference between thisalgorithm and 4-bit version suggested by Lycklama and Hadzila-cos is that this algorithm achieves 4 bits usage while keeping thecommunication variables of the BL algorithm and FCFS componentseparate so that the composition theorem can still be applied.

Compared to the A-FCFS algorithmgiven in Fig. 3, this algorithmhas three changes: (i) instead of assigning an integer to T at line 3,here we set a bit to 1; (ii) the removed wait statement from theLH algorithm to design the A-FCFS algorithm is brought back, butplaced after the CS. The FCFS components and the ME componentsuse separate communication bits. Now, the BL algorithm worksindependently, and the D value is used exclusively to make surethat the outgoing threads wait for those threads in the doorway tocome out to avoid bypasses; and (iii) the local integer variable t isreplaced by a binary bit b, and it is inverted after every turn. Theresulting algorithm referred to as NA-FCFS is given in Fig. 4.

5.1. Correctness

Theorems 4.1 and 4.2 and the Assertions 4.8 and 4.9 are alsohold true for the NA-FCFS algorithm.We need to prove the lockoutfreedom.

Assertion 5.1. In the NA-FCFS algorithm, if a thread i is at line 2 andanother thread j is between 1 and 9, then j cannot cross line 10 beforei reaches line 4.

Proof. As long as the thread i is at line 2, thread jwill find D[i] = 0at line 10 and thereforewill be blocked there at least until i reachesline 4 to reset D[i] to 0. �

A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038 1035

Fig. 4. Nonatomic FCFS algorithm (NA-FCFS).

Assertion 5.2. In the NA-FCFS algorithm, no read on a T variable atline 2 can overlap with writes on it from different competitions for theCS.

Proof. Suppose j.2[R(T [i])] overlaps with either i.3[W (T [i])] ori.9[W (T [i])]. Then, j.2[T [i]] cannot overlap with the subsequentwrites on T [i], because i cannot cross line 10 before j reaches line4 to set D[j] to 0. �

Assertion 5.3. In the NA-FCFS algorithm, if a thread i is blocked atline 5 by another thread j (i.e., S[j] = 0 ∧ S[j] = T [j] is true for i atline 5), then: (a) j.[3, 9]; and (b) i.2[R(T [j])] must return the valuefrom j’s latest write j.3[W (T [j])].

Proof. The value of T [j] is 0 when the thread j is outside theFCFS unsafe region. (i.e., j.[1, 2] or j.[10, 11]). This implies thatS[j] = 0 ∧ S[j] = T [j] is false for any other thread i at line 5 andtherefore it cannot be blocked by j at line 5. This proves part (a)of the assertion. It is enough to prove that i.2[R(T [j])] could notbe from the writes of any previous CS competitions of j. Let thesequence of i’s competitions be . . . C4 → C3 → C2 → C1, andC1 is the current competition. Without loss of generality, assumethe value of T [j] during C1 as 01, during C2 as 10, during C3 as 01,during C4 as 10, and so on. Since i is blocked by j at line 5 duringC1, the value of S[j] and T [j] must be 01. By Assertion 5.2, no readcan overlapwithwrites of twodistinct competitions. Since a threadsets and resets only one bit of T , a read on T [j] during a competitionof j can return either 0 or the correct turn value. So, i must haveread its value from either C1, C3, . . . Suppose i has obtained itsS[j] value from C3. Then, i must have reached line 4 before theend of C3 as j would have waited at line 10 for D[i] = 0. Then,

any later thread, including j during C1, must observe the correctnonzero T [i] value and therefore, by Theorem 4.2, j must wait atline 5 until i completes its CS access. This is a contradiction that jhas completed C3 and C2 and back again for the competition C1,but i is still blocked by j at line 5. This implies that i.2[R(T [j])]mustbe from the latest j.3[W (T [j])]. �

Corollary 5.4. In the NA-FCFS algorithm, if a thread i is blocked byanother thread j at line 5 and i.2[R] and j.3[W ] be latest read andwrite, respectively, of i and j then j.3[W ] 99K i.2[R](i.e., s(j.3[W ]) →

f (i.2[R])).

Proof. By the Assertion 5.3, the value obtained by the read i.2[R]must be from the write j.3[W ]. That means, j.3[W ] must havestarted before i.2[R] ended. This implies, j.3[W ] 99K i.2[R]. �

Assertion 5.5. In the NA-FCFS algorithm, let t1, . . . , tm be m threadssuch that, for 1 ≤ k < m, the thread tk is blocked by the thread tk+1at line 5. Then, for 1 < j ≤ m, i < j, no thread tj can be blocked by athread ti at line 5.

Proof. By the Corollary 5.4, for 1 < j ≤ m, s(tj.3[W ]) →

f (tj−1.2[R]). By A7, for 1 ≤ j ≤ m, f (tj.2[R]) → s(tj.3[W ]). Puttingtogether we have f (tj.2[R]) → s(tj.3[W ]) → f (tj−1.2[R]) →

s(tj−1.3[W ]), for 1 < j ≤ m. That is, f (tm.2[R]) → s(tm.3[W ])→ f (tm−1.2[R]) → s(tm−1.3[W ]) → f (tm−2.2[R]) → s(tm−2.3[W ]), . . . , f (t1.2[R]) → s(t1.3[W ]). This implies, for 1 < j ≤

m, i < j, f (tj.2[R]) → s(ti.3[W ]). From this we get tj.2[R] →

ti.3[W ], for 1 < j ≤ m, i < j. That is, for 1 < j ≤ m, i < j, thethread tj has completed its read on T [i] before the thread i wroteon it. So, after thewrite ti.3[W ], T [i]will be changed, and therefore(S[i] = T [i]) is true for j > i. Hence, for 1 < j ≤ m, i < j, no threadtj can be blocked by a thread ti at line 5. �

Corollary 5.6. In the NA-FCFS algorithm, no cyclic wait of threads atline 5 is possible.

Assertion 5.7. In the NA-FCFS algorithm, no thread will be blockedforever at line 5.

Proof. Assume that m threads are blocked at line 5. By Theo-rem 4.2, none of these threads have to wait for a new threadcrossing the doorway later. By the Corollary 5.6, no cyclic wait ispossible. That means a subset of these m threads will be waitingonly for threads which are beyond line 5. Since the BL algorithmassures liveness, these threads will complete their CS executionsand leave the competition one by one, enabling a subset of threadswaiting at line 5 to progress. Repeating this argument, eventuallyallm threads will cross line 5. �

Theorem 5.8. The NA-FCFS algorithm assures deadlock and lockoutfreedom.

Proof. Follows from the Assertions 4.8, 4.9 and 5.7. �

6. Highly-fair nonatomic FCFS algorithm

In the LH, A-FCFS, and NA-FCFS algorithms, threads follow theFCFS order if they enter the doorway one after another. Otherwise,the BL algorithm is in effect and that favors threads with smallerids. To mitigate this unfairness, we use the symmetry propertyinherent in the NA-FCFS algorithm that it uses the T values 10 and01 alternatively. Such a symmetry can be used in several waysto assure fairness. We use it in a specific way here. We map 10to 1 and 01 to −1. Then, using these two values and the ids,every thread i will have two unique positions (−i and i) in theinterval of integers [−n, n]. For a particular competition, a threadwill be mapped to a unique position in the interval. Since the

1036 A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038

Fig. 5. Highly-fair nonatomic FCFS algorithm (HF-FCFS).

interval is totally ordered, we can easily extend the BL algorithm towork for this change. In the BL algorithm, each thread is mappedstatically to a unique position in the interval [1, n]. In the proposedalgorithm, each thread will be mapped to a unique position in theinterval [−n, n] during a competition. In both cases, at any time,the positions mapped to the competing threads are totally orderedand therefore the priority of a thread during its competition isfixed. This is a necessary property to assure mutual exclusion anddeadlock freedom.

For a given thread i, we define two sets of threads at any time,higher priority threads HPT(i) and lower priority threads LPT(i)as follows: HPT(i) := {j/(X[j] = 0) ∧ (s[j] ∗ j < s[i] ∗ i)} andLPT(i) := {j/(X[j] = 0) ∧ (s[j] ∗ j > s[i] ∗ i)}, where for everyk, s[k] := T [k][1] − T [k][0].

Now, for any two threads i and j, i < j, we have four cases basedon T values: (i) both take 10 then i wins; (ii) both take 01 thenj wins; (iii) i takes 01 and j takes 10 then i wins; and (iv) i takes10 and j takes 01 then j wins. So, we have equal chances for eachthread to win the tie and therefore the algorithm assures fairnessamong concurrent threads. The resulting algorithm incorporatingthe above idea, referred to as HF-FCFS, is given in Fig. 5.

TheNA-FCFS andHF-FCFS algorithms differ only in theME com-ponent. When mutual exclusion and lockout freedom of ME com-ponent are assured, all other properties of the NA-FCFS algorithmhold true for the HF-FCFS algorithm. Mutual exclusion and lock-out freedom can be easily inferred from the fact that the pointsin the interval [n, n] are totally ordered, each competing thread isuniquely mapped to a point in [−n, n] during a competition, andno thread leaving the CS can cross the line 5 again when an earlierthread is at line 7 or 8.

6.1. Fairness

When threads enter the doorway one after the other, the sharedresource is accessed in the order they cross the doorway.When twothreads cross the doorway concurrently (such a possibility is highin multicore and multiprocessor systems), the algorithm breaksthe tie in favor of one process over the other. Although fairness is avery general concept and an exact formal definition does not exist,providing equal opportunity to win tie-breaks for every processduring their life time is an acceptable definition of fairness in thiscontext. That is, in any execution, for any two threads p and q,consider the set of passages where p and q execute through thedoorway concurrently. Then, in half of those passages p is givenpriority over q, and in the other half q is given priority over p.

In the previous FCFS algorithms, between two concurrentthreads, the thread with smaller id has a 100% chance and thethread with larger id has a 0% chance of winning tie-breaks. Thatis, the worst and the average cases in those algorithms are samefor any particular process and different for different processes. Inthe HF-FCFS algorithm, all threads have equal chance of winningthe tie-breaks. More formally, every thread has equal probabilityof winning ties. This is proved next.

Theorem 6.1. The HF-FCFS algorithm assures fair tie-breaking be-tween any two competing threads.

Proof. Consider that the threads i and j (i < j) cross the doorwayconcurrently. We need to prove that i and j have equal chance ofwinning the tie. The threads i and j use their T values to break thetie. There are four possible values for the pair (T [i], T [j]) and theyare (10, 10), (10, 01), (01, 10), and (01, 01). In that, the values(10, 10) and (01, 10) favor i, and the other two values favor j. Sinceall these values are equally probable, the threads i and j have equalchance of winning a tie. �

7. Comparison analysis

Compared to the other existing atomic FCFS algorithms, theatomic algorithm A-FCFS presented in this paper is simple and su-perior in shared space usage. Among the existing nonatomic algo-rithms, the LH algorithm is the best in terms of fairness and sharedspace usage (the LH algorithm requires 5 shared bits per thread).

When compared with the LH algorithm, the A-FCFS algorithmpresented in this paper assures improved fairness as it avoidsScenario 1. Also, compared to the LH algorithm, the A-FCFSalgorithmuses less shared bits per thread (3 shared bits per thread)and is simpler as illustrated in Sections 3.1.1 and 4. Therefore, interms of simplicity and shared space usage, the A-FCFS algorithmis the current best FCFS algorithm using read–write operations.Next, we compare the NA-FCFS and HF-FCFS algorithms with theLH algorithm for shared space usage, fairness, and simplicity.

In terms of space complexity, the algorithms NA-FCFS and HF-FCFS are better than the LH algorithm (NA-FCFS and HF-FCFS use4 shared bits per thread). The authors in [15] have proposed amodification to the LH algorithm to reduce the requirement ofshared space to 4 bits per thread. This modification compromisesthe composition property (Theorem 3.1), and that complicates thelogic of the algorithm. Compared to this modified algorithm, theNA-FCFS and HF-FCFS algorithms are marginally better (NA-FCFSand HF-FCFS use 7 shared values per thread whereas the modifiedLH 4-bit algorithm requires 8 shared values per thread). Theimportant difference is that the NA-FCFS and HF-FCFS algorithmspreserve the composition property.

Based on doorway crossing, the fairness between two compet-ing threads can be considered under two cases: threads cross-ing the doorway sequentially (CDW-SQ) and threads crossing the

A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038 1037

Table 1Algorithms comparison summary.

Reference Fairness between i and j, i < j Shared space/threadCDW-SQ CDW-CC Bits Values

Atomic algorithms

1 Katseff [10] FCFS i is favored over j 2n + 3 4n+ 52 Jayanti et al. [9] FCFS i is favored over j log(2n) + 2 2n+ 43 Taubenfeld [22] FCFS i is favored over j log(n) + 2 n + 54 Aravind [2] FCFS Equal chance log(n) + 2 2n+ 55 A-FCFS FCFS i is favored over j 3 6

Nonatomic algorithms

1 Lamport [12] FCFS i is favored over j ∞ ∞

2 Lamport [13] FCFS i is favored over j n + 1 2n+ 23 LH [15] FCFS i is favored over j 5(4) 10(8)4 NA-FCFS FCFS i is favored over j 4 75 HF-FCFS FCFS Equal chance 4 7

Szymanski [21] 1-bounded wait 3 5Peterson [16] 2-bounded wait 2 4BL [6,13] Lockout free 1 2

Shared memory accesses when no contention to the CS

Reference # of writes # of reads

LH [15] 7 4n + 1A-FCFS 6 2nNA-FCFS & HF-FCFS 6 3n

doorway concurrently (CDW-CC). All the algorithms assure theFCFS fairness among threads crossing the doorway sequentially.However, the algorithm NA-FCFS assures improved fairness com-pared to the LH algorithm, as the NA-FCFS algorithm avoids Sce-nario 1. The final algorithm, HF-FCFS, in addition, assures fairnessamong threads that cross the doorway concurrently. Overall, theLH algorithm satisfies properties P1–P5. The algorithm HF-FCFSsatisfies the additional stronger fairness property F6 (concurrentthread fairness property). The HF-FCFS algorithm is the first andonly FCFS algorithm using safe bits that satisfies this property.

On the surface, due to the structural closeness, the key differ-ences between the algorithms proposed in this paper and the LHalgorithmmight not be apparent immediately. A closer look at themost complex aspect of the algorithms carefully, which is under-standing how they avoid deadlock, will reveal the subtle differ-ences. The proofs of correctness, particularly the deadlock freedomproperty, are relatively straightforward in the proposed algorithmsdue to the simplified use of the T values. This is an evidence thatthe proposed algorithms are simpler.

Finally, we compare the execution complexity between A-FCFS,NA-FCFS, and HF-FCFS algorithms and the LH algorithm. There isno doubt about the usefulness of the execution time complexityfor any algorithm. However, it is difficult to find the appropriateand universally agreed definition of execution time complexityof synchronization algorithms designed for asynchronous sharedmemory systems [23]. The main reasons are: (i) nothing can beassumed about the speed of the threads and (ii) often the progressof a thread during contention for the shared resource depends onthe progress of the other threads. However, since an operation thataccesses the sharedmemory takesmuchmore time than accessingthe local memory, the number of reads and writes to the sharedmemory has been considered as a good measure of execution timefor synchronization algorithms in shared memory systems [14].Again, when busy waits are involved, counting reads and writesduring contention for the CS is notmeaningful. Therefore, countingthe number of reads and writes to shared memory when thereis no contention for the CS has been used in the literature as anacceptable measure for execution complexity [14,23].

When there is no contention for the CS, the LH algorithm uses 7writes and 4n+1 reads. The algorithmA-FCFS uses 6writes and 2nreads and the algorithmsNA-FCFS andHF-FCFS use 6writes and 3n

reads. All the algorithms use same number of wait loops (the waiton D at line 7 in the LH algorithm has been moved to line 10 in theNA-FCFS andHF-FCFS algorithms). Based on these two keymetrics,the algorithms A-FCFS, NA-FCFS, and HF-FCFS are expected to havebetter execution performance than the LH algorithm.

Table 1 summarizes the comparison analysis.

8. Concluding remarks

The emergence of multicore processors and their ubiquitoususe have renewed the interest in concurrent programming[5,8,20,23]. It has practical importance in many applications suchas event-based implementations of GUI, real-time OS, multiusergames, sensor networks and embedded systems, etc. Nonatomicmemories areweaker type ofmemories. Mostmultiportmemoriesare of this weaker type and in recent times majority of embeddedsystems are coming up with such multiport memories. In thispaper, we have presented simplemutual exclusion algorithms thathave several attractive properties such as nonatomicity, efficientshared space usage, and FCFS, which are appealing to be used inprogramming for embedded systems with multiport memories.Also, the HF-FCFS algorithm proposed in this paper is the firstand only FCFS algorithm using safe bits satisfying fairness amongconcurrent threads (property P6).

Among the nonatomic algorithms, the BL algorithm (uses 1shared bit per thread) is space optimal to assure mutual exclusionand deadlock freedom. Peterson’s 2-bit algorithm is space optimalto assure mutual exclusion, deadlock freedom, and lockout free-dom. Based on our experience and past studies, we believe that atleast 5 shared values per thread (i.e., to indicate 5 different states ofa thread: 1—not competing; 2—inside the doorway; 3—outside thedoorway; 4—current competition; and 5—come back for next com-petition) are necessary to assure safety, deadlock freedom, lockoutfreedom, and FCFS based onwaitfree doorway.We leave it as a con-jecture. That is, in terms of shared bits, the conjecture is that theoptimal number of shared bits required for any FCFSmutual exclu-sion algorithm using read–write operations is 3. If the conjecture isproven true, then one of our algorithms is space optimal in termsof shared bits assuring all four properties mentioned above.

Among the token based algorithms, the bakery algorithm isthe only nonatomic algorithm assuring the FCFS property. But, it

1038 A.A. Aravind / J. Parallel Distrib. Comput. 73 (2013) 1029–1038

requires unbounded shared space. A token based nonatomic FCFSusing bounded shared space is still unknown, and a token basedalgorithm assuring almost FCFS property using bounded sharedspace is presented in [4]. (In this algorithm, the FCFS order propertycould fail when a thread does a flickeringwrite on a shared variablefor an unknown period of time.)

There are several questions that remain to be answered regard-ing the algorithms presented in this paper. For example, how toachieve or is there an effective way to achieve some of the impor-tant properties such as local spin, adaptiveness, fault-tolerance, in-finite failure, etc., are unknown. Finding answers to such questionscan be some directions for future research on this topic. Also, aninteresting future study would be to see the performance of theproposed algorithms.

Acknowledgments

This work was supported by NSERC Discovery Grant (312162-2010). The author would like to thank the editor and theanonymous reviewers for their valuable comments which helpedto improve the presentation.

References

[1] J.H. Anderson, Y.-J. Kim, T. Herman, Shared-memory mutual exclusion: majorresearch trends since 1986, Distributed Computing 16 (2003) 75–110.

[2] A. Aravind, Highly-fair bakery algorithm using symmetric tokens, InformationProcessing Letters 110 (2010) 1055–1060.

[3] A. Aravind, Yet another simple solution for concurrent programming controlproblem, IEEE Transactions on Parallel and Distributed Systems 22 (6) (2011)1056–1063.

[4] A. Aravind, W.H. Hesselink, Nonatomic dual bakery algorithm with boundedtokens, Acta Informatica 48 (2) (2011) 67–96.

[5] M. Ben-Ari, Principles of Concurrent and Distributed Programming, Addison-Wesley, 2006.

[6] J.E. Burns, Complexity of communication among asynchronous parallelprocesses, Ph.D. Thesis, School of Information and Computer Science, GeorgiaInstitute of Technology, Atlanta, January 1981.

[7] E.W. Dijkstra, Solution of a problem in concurrent programming control,Communications of the ACM 8 (9) (1965) 569.

[8] M. Herlihy, V. Luchangco, Distributed computing and themulticore revolution,ACM SIGACT News 39 (1) (2008) 62–72.

[9] P. Jayanti, K. Tan, G. Friedland, A. Katz, Bounding Lamport’s bakery algorithm,in: Proceedings of the SOFSEM, in: LNCS, vol. 2234, 2001, pp. 261–270.

[10] H.P. Katseff, A new solution to the critical section problem, in: Proceedings ofthe 10th ACM Symposium on the Theory of Computing, 1978, pp. 86–88.

[11] D.E. Knuth, Additional comments on a problem in concurrent programmingcontrol, Communications of the ACM 9 (5) (1966) 321–322.

[12] L. Lamport, A new solution of Dijkstra’s concurrent programming problem,Communications of the ACM 17 (8) (1974) 453–455.

[13] L. Lamport, The mutual exclusion problem: part I—theory of interprocesscommunication, and part II—statement and solutions, Journal of the ACM 33(2) (1986) 313–348.

[14] L. Lamport, A fastmutual exclusion algorithm, ACM Transactions on ComputerSystems 5 (1) (1987) 1–11.

[15] E.A. Lycklama, V. Hadzilacos, A first-come-first-served mutual-exclusionalgorithm with small communication variables, ACM Transactions onProgramming Languages and Systems 13 (4) (1991) 558–576.

[16] G.L. Peterson, A new solution to Lamport’s concurrent programming problemusing small variables, ACM Transactions on Programming Languages andSystems 5 (1) (1983) 56–65.

[17] G.L. Peterson, M.J. Fischer, Economical solutions for the critical sectionproblem in a distributed system, in: Proceedings of the Ninth Annual ACMSymposium on Theory of Computing, 1977, pp. 91–97.

[18] M. Raynal, Algorithms for Mutual Exclusion Problem, The MIT Press, 1986.[19] M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations,

Springer-Verlag, 2013.[20] H. Sutter, J. Larus, Software and the concurrency revolution, ACMQueue (2005)

54–62.[21] B.K. Szymanski, A simple solution to Lamport’s concurrent programming

problem with linear wait, in: Proceedings of the 2nd ACM InternationalConference on Supercomputing, 1988, pp. 621–626.

[22] G. Taubenfeld, The black-white bakery algorithm and related bounded-space,adaptive, local-spinning and FIFO algorithms, in: Proc. of the DISC, in: LNCS,vol. 3274, 2004, pp. 56–70.

[23] G. Taubenfeld, Synchronization Algorithms and Concurrent Programming,Pearson Education, Prentice Hall, 2006.

Alex A. Aravind is an Associate Professor in the Depart-ment of Computer Science at the University of NorthernBritish Columbia (UNBC), Canada. Alex received his Ph.D.in Computer Science from Indian Institute of Science (IISc),Bangalore, and M.Tech. in Computer Science from IndianInstitute of Technology (IIT), Kharagpur, India. After a briefstint at the Supercomputer Education and Research Center(SERC), IISc, he was a Post-doctoral fellow at theMemorialUniversity of Newfoundland, St. John’s, Canada.

Alex’s areas of research interest include operating sys-tems, concurrent and distributed computing, mobile ad

hoc and wireless networks, and modeling and simulation of complex systems. Hehas published more than 40 research articles in leading journals and internationalconferences, and has co-authored a book on operating systems, published by Pear-son Education. Alex is amember of ACM, IEEE, and SCS, and has served as a programtechnical committee member for many international conferences, and a reviewerfor several leading journals and conferences. He has chaired a number of conferencesessions and organized several workshops and delivered invited talks.