15
IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005 711 PAPER Dynamic Replica Control Based on Fairly Assigned Variation of Data for Loosely Coupled Distributed Database Systems Takao YAMASHITA a) , Member SUMMARY This paper proposes a decentralized and asynchronous replica control method based on a fair assignment of the variation in numer- ical data that has weak consistency for loosely coupled database systems managed or used by dierent organizations of human activity. Our method eliminates the asynchronous abort of already committed transactions even if replicas in all network partitions continue to process transactions when network partitioning occurs. A decentralized and asynchronous approach is needed because it is dicult to keep a number of loosely coupled sys- tems in working order, and replica operations performed in a centralized and synchronous way can degrade the performance of transaction process- ing. We eliminate the transaction abort by fairly distributing the variation in numerical data to replicas according to their demands and updating the distributed variation using only asynchronously propagated update transac- tions without calculating the precise global state among reachable replicas. In addition, fairly assigning the variation of data to replicas equalizes the disadvantages of processing update transactions among replicas. Fairness control for assigning the data variation is performed by averaging the vari- ation requested by the replicas. A simulation showed that our system can achieve extremely high performance for processing update transactions and fairness among replicas. key words: data replication, weak consistency, numerical data, fairness, clock synchronization 1. Introduction The progress of world-wide computer networks has aected various fields of human activity, especially commercial ac- tivities, such as online shopping, trading, and banking. Ap- plications used in commercial activities often handle numer- ical values, including the total stock of a particular product in a warehouse, production orders, and resources available to customers, such as rooms in a hotel. Such data is usually shared among many organizations all over the world. Those organizations usually place copies of the data in shared sites for better availability and scalability. As a result, a number of loosely coupled replicas exist around the world. It is dif- ficult to keep loosely coupled replicated systems in working order for a long time. In addition, the type of user of these replicated database systems is changing from system experts to customers in commercial activities. Control methods for replicated database systems can be classified into two types: strict and weak consistency. Strict consistency methods based on a quorum consensus and read-one-write-all [1], [2] achieve serializability [3] but require a heavier overhead. On the other hand, weak con- Manuscript received May 31, 2004. Manuscript revised October 22, 2004. The author is with the NTT Information Sharing Platform Laboratories, NTT Corporation, Musashino-shi, 180–8585 Japan. a) E-mail: [email protected] DOI: 10.1093/ietisy/e88–d.4.711 sistency replica control methods [4]–[13] achieve high per- formance, availability, and scalability in return for permit- ting some inconsistency in the data. Several weak con- sistency replication methods [8], [9], [11] achieve extremely high availability and scalability for updating data anytime and anywhere using a decentralized and asynchronous ap- proach. In such replication methods, any replica receiving an update transaction can initially process and then asyn- chronously propagate it among replicas. This type of data replication is called lazy-group replication [14]. In lazy- group replication, if an update transaction conflicts with al- ready processed transactions, a conflict resolution procedure is invoked. This procedure depends on the semantics of a transaction. Replica control methods of this type have often been designed for specific applications. Conflict resolution can be defined using such application knowledge. When replicated database systems handle data using lazy-group replication even when network partitioning oc- curs, the asynchronous abort of already committed transac- tions can cause problems for customers and for the organi- zations managing or using replicas. Customers are required to issue requests to replicated database systems on the as- sumption that the asynchronous abort of already committed transactions can occur. For example, a customer sometimes chooses a distributor who can deliver a product earlier than others. In this case, the distributor must decide the earliest acceptable date for delivery. Assume that several replicas are separated from others but still continue to process up- date requests. The delivery date quoted to a customer might be extended because of the high number of requests pro- cessed by the separated replicas and the abort of some of the requests. When customers were notified about the longer delivery, they have to cancel their orders and place them to other distributors. Such behavior of replicated database systems confuses customers. From the viewpoint of orga- nizations managing or using replicas (e.g., companies), the asynchronous abort of already committed transactions can be frequently imposed on some replicas. Because they usu- ally compete with each other for their business, the prob- ability of such an abort occurring in the result of customer trading can cause unfairness among replicas managed by the organizations. To solve the above problems caused by the asyn- chronous abort of already committed transactions, we can use Data-Value Partitioning (DVP)[15] in a lazy-group replication method. DVP splits up the value of data for database systems. Each of them processes transactions us- Copyright c 2005 The Institute of Electronics, Information and Communication Engineers

Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005711

PAPER

Dynamic Replica Control Based on Fairly Assigned Variation ofData for Loosely Coupled Distributed Database Systems

Takao YAMASHITA†a), Member

SUMMARY This paper proposes a decentralized and asynchronousreplica control method based on a fair assignment of the variation in numer-ical data that has weak consistency for loosely coupled database systemsmanaged or used by different organizations of human activity. Our methodeliminates the asynchronous abort of already committed transactions evenif replicas in all network partitions continue to process transactions whennetwork partitioning occurs. A decentralized and asynchronous approachis needed because it is difficult to keep a number of loosely coupled sys-tems in working order, and replica operations performed in a centralizedand synchronous way can degrade the performance of transaction process-ing. We eliminate the transaction abort by fairly distributing the variationin numerical data to replicas according to their demands and updating thedistributed variation using only asynchronously propagated update transac-tions without calculating the precise global state among reachable replicas.In addition, fairly assigning the variation of data to replicas equalizes thedisadvantages of processing update transactions among replicas. Fairnesscontrol for assigning the data variation is performed by averaging the vari-ation requested by the replicas. A simulation showed that our system canachieve extremely high performance for processing update transactions andfairness among replicas.key words: data replication, weak consistency, numerical data, fairness,clock synchronization

1. Introduction

The progress of world-wide computer networks has affectedvarious fields of human activity, especially commercial ac-tivities, such as online shopping, trading, and banking. Ap-plications used in commercial activities often handle numer-ical values, including the total stock of a particular productin a warehouse, production orders, and resources availableto customers, such as rooms in a hotel. Such data is usuallyshared among many organizations all over the world. Thoseorganizations usually place copies of the data in shared sitesfor better availability and scalability. As a result, a numberof loosely coupled replicas exist around the world. It is dif-ficult to keep loosely coupled replicated systems in workingorder for a long time. In addition, the type of user of thesereplicated database systems is changing from system expertsto customers in commercial activities.

Control methods for replicated database systems canbe classified into two types: strict and weak consistency.Strict consistency methods based on a quorum consensusand read-one-write-all [1], [2] achieve serializability [3] butrequire a heavier overhead. On the other hand, weak con-

Manuscript received May 31, 2004.Manuscript revised October 22, 2004.†The author is with the NTT Information Sharing Platform

Laboratories, NTT Corporation, Musashino-shi, 180–8585 Japan.a) E-mail: [email protected]

DOI: 10.1093/ietisy/e88–d.4.711

sistency replica control methods [4]–[13] achieve high per-formance, availability, and scalability in return for permit-ting some inconsistency in the data. Several weak con-sistency replication methods [8], [9], [11] achieve extremelyhigh availability and scalability for updating data anytimeand anywhere using a decentralized and asynchronous ap-proach. In such replication methods, any replica receivingan update transaction can initially process and then asyn-chronously propagate it among replicas. This type of datareplication is called lazy-group replication [14]. In lazy-group replication, if an update transaction conflicts with al-ready processed transactions, a conflict resolution procedureis invoked. This procedure depends on the semantics of atransaction. Replica control methods of this type have oftenbeen designed for specific applications. Conflict resolutioncan be defined using such application knowledge.

When replicated database systems handle data usinglazy-group replication even when network partitioning oc-curs, the asynchronous abort of already committed transac-tions can cause problems for customers and for the organi-zations managing or using replicas. Customers are requiredto issue requests to replicated database systems on the as-sumption that the asynchronous abort of already committedtransactions can occur. For example, a customer sometimeschooses a distributor who can deliver a product earlier thanothers. In this case, the distributor must decide the earliestacceptable date for delivery. Assume that several replicasare separated from others but still continue to process up-date requests. The delivery date quoted to a customer mightbe extended because of the high number of requests pro-cessed by the separated replicas and the abort of some of therequests. When customers were notified about the longerdelivery, they have to cancel their orders and place themto other distributors. Such behavior of replicated databasesystems confuses customers. From the viewpoint of orga-nizations managing or using replicas (e.g., companies), theasynchronous abort of already committed transactions canbe frequently imposed on some replicas. Because they usu-ally compete with each other for their business, the prob-ability of such an abort occurring in the result of customertrading can cause unfairness among replicas managed by theorganizations.

To solve the above problems caused by the asyn-chronous abort of already committed transactions, we canuse Data-Value Partitioning (DVP) [15] in a lazy-groupreplication method. DVP splits up the value of data fordatabase systems. Each of them processes transactions us-

Copyright c© 2005 The Institute of Electronics, Information and Communication Engineers

Page 2: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

712IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

ing the split value. However, because DVP needs to cal-culate the current value of data, it requires the periodicalsynchronous operation of replicas even though replicas canprocess transactions without interacting with others. Thissynchronous operation decreases the performance and scal-ability of transaction processing.

This paper proposes a decentralized and asynchronousreplica control method based on a fair assignment of thevariation in numerical data that has weak consistency. Wecall this the fairly assigned variation based replica con-trol (FVRC) method. This method achieves decentralizedand asynchronous replica control that eliminates the asyn-chronous abort of already committed transactions even ifreplicas in all network partitions continue to process trans-actions. We eliminate the abort of transactions by distribut-ing the variation of numerical data to replicas and updatingthe distributed variation using only asynchronously propa-gated update transactions. Because this update of the dis-tributed variation is performed asynchronously without syn-chronously calculating the precise global state among reach-able replicas, our method can achieve high performance andscalability. It also enables fairness among replicas by reas-signing the variation of data to replicas according to theirdemands so that the disadvantage of processing transactionscan be balanced.

The remainder of this paper is organized as fol-lows. Section 2 describes a decentralized and asynchronousreplica control method based on a fairly assigned variationof data that has weak consistency. This section covers threeissues. The first is the characteristics of data handled by theFVRC system and operations for it. The second is how areplica dynamically processes those operations in a decen-tralized and asynchronous way. The third is an algorithmfor fairly assigning the variation of data among replicas. InSec. 3, the evaluated results are discussed in terms of theperformance of transaction processing and fairness amongreplicas achieved by our method. Section 4 discusses re-lated work. Section 5 concludes the paper.

2. Fairly Assigned Variation Based Replica Control

2.1 Characteristics of Data

First, we describe the general characteristics of data han-dled by the FVRC system. Figure 1 shows the general def-inition of data. A data item is represented by a signed nu-merical value that has upper and lower bounds. The valuemust not be larger than the upper bound or smaller than thelower bound. We call its range between the lower and upperbounds the valid range. In addition, we call the change thatincreases (decreases) the value a positive (negative) varia-tion. The positive and negative variations are representedby the absolute value and the direction of the change in thevalue of data. The global positive and negative variationlimits in the figure mean the maximum values that can bechanged from the current value in the positive and negativedirections, respectively. When multiple nodes manage data

Fig. 1 Numerical data with upper and lower bounds.

of this type, those nodes share the global variation limits inthe two directions. Those limits are divided among replicas.We call the variation that a replica can perform without in-teracting with other replicas the local variation limit. Split-ting up the value of data was originally proposed in Data-Value Partitioning (DVP) [15].

Examples of such data include stock in a warehouse,production orders, the accounting period for which a cus-tomer uses facilities, balance of a bank account, and thenumber of vacant rooms in a hotel. In the case of ware-house stock, the upper bound is determined by the capacityof the warehouse and the lower bound is usually zero. Forproduction orders, the upper bound is the total productioncapacity in a particular period and the lower bound is zero.For the accounting period for which a customer uses facili-ties, the lower bound is the present time and the upper boundis infinity.

2.2 Operation Types

We define four types of operations of the FVRC system:read, data initialization, positive variation, and negativevariation. These operations are described in detail inSec. 2.7. The data initialization operation simply sets thevalue of data. The read operation returns the range of thecurrent value of a data object. The positive and negativevariation operations modify the data in the positive and neg-ative directions, respectively.

2.3 Assumptions

• The clocks of all nodes are precisely synchronized.The timestamp that is included in a message exchangedbetween replicas can be considered reliable by othernodes. Clock synchronization is accomplished in vari-ous ways [16]–[18].• Every node can calculate to which neighboring node

it should transmit a message to send it to a particularnode in a network. This can easily be accomplished us-ing distributed shortest path algorithms used for routingmethods [19].

Page 3: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA713

2.4 Overview of Replica Control

The FVRC system has the same architecture as that for lazy-group replication [14] to achieve high scalability, availabil-ity, and performance for transaction processing. This ar-chitecture consists of replica and client nodes as shown inFig. 2. Replicas are connected with logical links and forma logical network. Any replica can initially process updatetransactions. The processed update transactions are asyn-chronously propagated among replicas.

In our system, a replica has three functions: (1) pro-cessing the four types of operations, (2) processing vari-ation operations that are asynchronously transmitted fromother replicas, and (3) reassigning the local variation limitsamong replicas. To perform these functions, a replica hasthree types of node-dependent values, which are describedin Sec. 2.6: accumulated variation limit, precedence vari-ation limit, and precedence consumption ratio. A replicaeliminates the asynchronous abort of already committedtransactions by processing transactions using only these val-ues managed by itself and by reducing but still keeping therange of available transactions within that for practical use.Because a replica does not interact with others for transac-tion processing, our method can achieve high performanceand scalability.

When a replica receives a read operation from a client,the replica returns the current value of its copy and its un-consumed local variation limit, which means the amount inthe local variation limit that has not been used to processvariation operations yet. In Fig. 2, when client C4 issues aread operation to replica 8, it sends a reply message to C4

without interacting with other replicas.When a variation operation arrives at a replica, the

replica processes the operation in one of three ways. Thefirst way is performed when the requested variation is equalto or less than its unconsumed local variation limit. Inthis case, a replica can process the operation without syn-chronously interacting with any other replicas. In Fig. 2,when client C1 issued a variation operation to replica 9,the operation was immediately processed by the replica andthen asynchronously propagated to replicas 6, 7, and 5. Theprocessed operation finally arrives at all replicas. The sec-ond and third ways are performed when the requested vari-ation is greater than its unconsumed local variation limit. Inthese ways, a replica interacts with other replicas for trans-action processing. However, we suppose that these ways areexceptional operations for replicas because replicas shouldfairly share the local variation limits prior to processing vari-ation operations. In the second way, a replica has a set ofreplicas that are comparatively tightly coupled with it. Wecall this a coupled replica set. A replica tries to obtain someof the unconsumed local variation limits of replicas includedin its coupled replica set. If a replica can obtain the variationlimit to process the operation, it processes it. In Fig. 2, whenreplica 4 receives a variation operation from client C2, it isprocessed by using the unconsumed local variation limits of

Fig. 2 Architecture of fairly assigned variation-based replica control.

replicas 4 and 8. In the third way, a variation operation ispropagated to other replicas. When a replica receives it anddetermines that it should use its unconsumed local varia-tion limit for the operation, then the operation is processed.If two or more replicas process the same operation, repli-cas other than the one that began processing it earliest per-form undo processes. In Fig. 2, when replicas 3 and 5 pro-cess a variation operation from client C2 and notify replica 1about the commitment of the operation in that order, replica1 sends a request for the undo of the processed operationto replica 5. When a variation operation is processed usingthe first and second ways, a client is notified that it has beensuccessfully processed with probability 1. When the thirdway is used, a client is notified that the probability that theoperation will be successfully processed is less than 1. Afterprocessing variation operations using the above three ways,a replica asynchronously disseminates a variation operationto the other replicas by means of gossiping [20].

In our method, the local variation limits are fairly as-signed among replicas and all replicas calculate their un-consumed local variation limit to process update transac-tions. These processes require a decentralized and asyn-chronous way because a centralized and synchronous waycan degrade the performance of transaction processing andit is difficult to keep a number of loosely coupled systemsin working order due to network partitioning, node failures,administrative shutdown, and so on.

Section 2.5 describes the range of available transac-tions in our method. The three types of node-dependent val-ues for transaction processing are explained in Secs. 2.6 and2.7. The fair assignment of the local variation limits is peri-odically performed based on the demand to change their lo-cal variation limits, which is described in detail in Secs. 2.9and 2.10. When network partitioning occurs, it is performedamong replicas that are accessible. This process is fair interms of the amount of variation demanded by replicas.

2.5 Available Transactions

To process general transactions requiring strict consistency,transaction processing by replicas must satisfy one-copy se-rializability [1]. However, in lazy-group replication, one-

Page 4: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

714IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

copy serializability can cause the abort of already committedtransactions through undo and redo operations because anyreplica can initially process transactions. Therefore, to elim-inate the asynchronous abort of already committed trans-actions, in addition to the transaction processing using thethree types of node-dependent values, our method reducesthe range of available transactions but keeps it within thatfor practical use. In our method, when a replica receives avariation operation asynchronously propagated from otherreplicas, there are two types of ordering variation opera-tions: ordered and commutative. The first type reorders vari-ation operations in ascending order of their timestamps andprocesses them. If a newer variation operation has alreadybeen processed, it is undone, older variation operations areprocessed, and the undone newer operation is then redone.For the first type of ordering, the total change in data valuecaused by variation operations must not change dependingon the processing order of operations. This condition is nec-essary because the total change in data value caused by vari-ation operations can cause failures in processing variationoperations after reordering, which means that the abort ofalready committed transactions can occur. The second typeprocesses variation operations in the order of their arrival.This type is available for operations that exhibit the conver-gence property [14]. Transactions that exhibit this propertydo not cause inconsistency depending on the processing or-der of transactions, which means that the total change indata value caused by variation operations is constant inde-pendently of the processing order of operations. Therefore,replicas do not abort already committed transactions in ei-ther type of ordering of variation operations.

In addition to the ordering of processed transactions,an undo operation can be performed in the third way of pro-cessing variation operations in our method. In this case, aredo operation is not performed, as described in the previ-ous section. Therefore, because an undo operation only de-creases the unconsumed local variation limit, replicas do notasynchronously abort already committed transactions in thethird way of transaction processing.

2.6 Node-Dependent Values for Decentralized and Asyn-chronous Replica Control

To process variation operations, a replica needs to be as-signed the local variation limit and calculate its unconsumedlocal variation limit. The calculation of the unconsumed lo-cal variation limit is divided into two phases: initializationand recalculation. The recalculation is necessary becausereplicas that have fully used their allocated unconsumed lo-cal variation limits cannot continue to process update trans-actions despite the existence of a newly available globalvariation limit.

A simple way to assign the local variation limits toreplicas and determine the unconsumed local variation lim-its in the initialization and recalculation phases is to calcu-late the global variation limit and allocate it to replicas inthe same way as in DVP. This simple way needs to calcu-

late the precise current value when all replicas are reach-able or its range when some replicas are unreachable. Therange can be calculated using the local variation limits al-located to unreachable replicas. When the upper and lowerbounds of the current value are vu and vl, respectively, thepositive and negative global variation limits are vmax − vu

and vl − vmin, respectively, where vmax and vmin are the up-per and lower bounds of the valid range of the data. This isa centralized and synchronous approach because it needs togather information from reachable replicas and preserve in-formation used in the recalculation phase unchanged whilecalculating the global variation limit. Therefore, while cal-culating the global variation limit in the recalculation phasein this way, a number of loosely coupled replicas stop pro-cessing update transactions. As a result, this simple way isnot only inadequate for a number of loosely coupled sys-tems but also can degrade the performance of transactionprocessing.

To achieve a decentralized and asynchronous replicacontrol to assign the local variation limits to replicas and de-termine the unconsumed local variation limit, we first intro-duce a new value, which we call the accumulated variationlimit (AVL). The AVL is a fuzzy global state of all replicasand has the same role as the global variation limit in thesimple way described just above. It is necessary as a basein order to fairly distribute the local variation limits amongreplicas. In addition, to achieve the asynchronous operationof a replica, the AVL is required to guarantee the consistencyof data even if replicas continue to process update transac-tions using the assigned variation of any of their transientstates including the recalculation phase of the unconsumedlocal variation limit calculation.

Every replica manages the AVL for every data object.The AVLs can vary among replicas. We use two types ofAVLs: positive and negative AVLs. The positive (negative)AVL is the difference between the upper (lower) bound andthe initial value of the data in the initialization phase. In therecalculation phase, a replica increases the positive and neg-ative AVLs as a result of the change caused by negative andpositive variation operations, respectively. This is becausewhen a variation operation changes the value of the data ina particular direction, the same variation in the opposite di-rection can be processed so that the consistency of data ismaintained. The definition of the AVL is the same as theglobal variation limit in the initialization phase. However,the AVL is different from it in the recalculation phase.

In our method, because updates by variation opera-tions are asynchronously propagated among replicas andnetwork partitioning occasionally occurs, the AVLs that arecalculated by replicas can vary. Because the negotiation ofthe AVLs among replicas needs their synchronized opera-tions, our method separates the determination of the fairnessamong replicas from the calculation of the global state ofthe AVL. Our method allocates part of the AVL to a replicawith a particular ratio as the total amount of the variationthat the replica can perform after the last data initializationoperation. We call this ratio the precedence variation limit

Page 5: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA715

(PVL). A replica calculates its unconsumed local variationlimit by subtracting the total amount of the variation that ithas already performed from the product of AVL and PVL.

From its definition, the AVL has two properties. First,the AVL is monotonically increasing as time progresses.Second, the AVLs in all replicas converge to the same valuewhen all replicas have finished processing all input updatetransactions. Because a replica processes variation oper-ations by consuming the product of AVL and PVL, thesetwo properties mean that already committed variation oper-ations will be never aborted in the future when the PVL isconstant. These properties are important for our method toeliminate the abort of already committed transactions andmaintain the consistency of data even if replicas processtransactions in any transient state of the AVL. As a result ofthese properties, when a replica receives an asynchronouslypropagated variation operation, it can immediately begin toprocess variation operations by using its unconsumed localvariation limit in the AVL increased by the variation opera-tion. In other words, any transient state of the AVL main-tains the consistency of data. Therefore, the definitions ofthe AVL and PVL lead to the decentralized, asynchronousand immediate update of the unconsumed local variationlimit for replicas in the recalculation phase by eliminatingthe high overhead needed for a centralized and synchronousapproach. This eliminates the degradation of advantages indecentralized and asynchronous methods to process updatetransactions such as those for updating anytime and any-where [8], [9], [11].

Figure 3 shows the relationships among the AVL, PVL,and global and local variation limits. In addition to the AVLmanaged by a replica, there is the unknown AVL caused byvariation operations which have already been processed byother replicas but not propagated to the replica. The sumof the AVL and the unknown AVL can be divided into theglobal variation limit and the total amount of the variationthat all replicas have already performed after the last datainitialization operation. The calculation of the latter and theunknown AVL needs synchronized operations among repli-cas. Hence, our method uses the AVL without calculatingthese. In addition, a replica can calculate its unconsumedlocal variation limit with only its local information, which isthe variation that it has already performed after the last data

Fig. 3 Relationships among AVL, PVL, and global and local variationlimits.

initialization operation and the product of the AVL and PVL.Because it uses decentralized and asynchronous replica op-erations, our method is suited to a number of loosely cou-pled systems and eliminates the degradation of transactionprocessing performance.

In our method, all nodes manage the following threetypes of node-dependent values for all data objects toachieve the three functions described in Sec. 2.4. The AVLand PVL have already been introduced in this section. Weformally redefine them here.

• Accumulated Variation Limit (AVL): There are twotypes of the AVLs: positive and negative AVLs. Thepositive (negative) AVL is initially the difference be-tween the upper (lower) bound and the initial value ofthe data. Then, the positive (negative) AVL is increasedby a negative (positive) variation operation that arrivesfrom a client or another replica. In the rest of this pa-per, we let A(i) be the AVL of replica i. When we needto distinguish AVLs, we use A(i)

+ and A(i)− for the posi-

tive and negative AVLs, respectively. Let v+ and v− bethe changes produced by a positive and a negative vari-ation operation, respectively. When positive and nega-tive variation operations are processed by replica i, thenegative and positive AVLs are modified to A(i)

− +v+ andA(i)+ + v−, respectively.

• Precedence Variation Limit (PVL): There are two typesof PVLs: positive and negative PVLs. The product ofthe positive (negative) AVL and the positive (negative)PVL is the variation that a replica can perform afterthe last data initialization operation. The unconsumedpositive (negative) local variation limit is the differencebetween the product of the positive (negative) AVL andthe positive (negative) PVL and the variation that areplica has already performed after the last data initial-ization operation. Therefore, the total sum of PVLs ofall replicas is 1 for the consistency of data. The PVLsare reassigned fairly among replicas. For simple reas-signment of PVLs, the PVL takes a discrete quantityin the FVRC system. Let d1, d2, d3, · · · dm be discretequantities of the PVL, where d1 < d2 < d3 < · · · dm,d j − d j−1 are constant for any j. We call d j the jth levelof PVL. In the rest of this paper, we let l(i) be the PVLon replica i. When we need to distinguish PVLs, weuse l(i)+ and l(i)− for the positive and negative PVLs, re-spectively.• Precedence Consumption Ratio (PCR): The PCR rep-

resents the ratio of the total amount of the variationperformed by a replica after the last data initializationoperation to the product of AVL and PVL. This valueshould be in the interval [0, 1]. There are two typesof PCRs: positive and negative PCRs. The PCR is ac-cumulated whenever a variation operation is processed.In the rest of this paper, we let r(i) be the PCR on replicai. When we need to distinguish the positive and nega-tive PCRs, we use r(i)

+ and r(i)− for the positive and nega-

tive AVLs, respectively. Let v(i)j be the variation caused

Page 6: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

716IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

by the jth variation operation arriving at replica i. ThePCR is calculated as r(i) := r(i) + v(i)

j /A(i)l(i).

2.7 Processing Operations

As mentioned in Sec. 2.2, there are four types of operations:read, data initialization, positive variation, and negative vari-ation. Here, we explain these operations in detail.

Let B+ and B− be the upper and lower bounds of a dataobject. Because the positive and negative unconsumed localvariation limits for replica i are l(i)+ A(i)

+ (1− r(i)+ ) and l(i)− A(i)

− (1−r(i)− ), respectively, the current value of the data object is in

the range between B+ − l(i)+ A(i)+ (1 − r(i)

+ ) and B− + l(i)− A(i)− (1 −

r(i)− ). When a replica receives a read operation, it returns the

value of its copy of the data object and that range. If a clientrequests a more precise range of the current value of a dataobject, a replica uses the sum of unconsumed local variationlimits on replicas in its coupled replica set in addition to itsown unconsumed local variation limit.

The data initialization operation simply sets the valueof a data object and the positive and negative AVLs.

The positive and negative variation operations increasethe positive and negative PCRs, respectively. In addition,the positive and negative variation operations increase thenegative and positive AVLs, respectively, and cause recal-culation of the negative and positive PCRs, respectively.

Let v(i)j be the variation of the jth variation operation

arriving at replica i. We state that a variation operation withv(i)

j meets the PVL, if

r(i) +v(i)

j

A(i)l(i)(1)

is equal to or less than 1.When a variation operation on a data object meets the

PVL of the data object, a replica processes it and changesthe copy of the data object in the replica. Then the PCR ofthe data object is changed to the value calculated by For-mula (1). When positive and negative variation operationsare successfully processed, the negative and positive AVLsare increased by the variation caused by the operations, re-spectively. In addition, the negative and positive PCRs aremodified to

r(i)A(i)

A(i) + ∆A(i), (2)

where ∆A(i)c is the variation caused by the variation opera-

tion.When the PVL is insufficient, a replica uses either the

second or third way described in Sec. 2.4 according its pol-icy. In the second way, a replica tries to obtain part of thePVL from each of several replicas included in its coupledreplica set to increase the level of its PVL. If it succeedsin obtaining enough PVL to process the variation operation,then it is processed in the same way as for processing a vari-ation operation from a client. A replica i that increases or

decreases some of its PVL modifies its PCR to

r(i) l(i)

l(i)∗, (3)

where l(i)∗, l(i), and r(i) are the PVL, PCR before the transfer

of the PVL and the changed PVL after the transfer of thePVL, respectively.

In the third way, when replica i receives a variation op-eration and does not have enough unconsumed local varia-tion limit to process it, replica i distributes it to other replicasusing gossiping [20]. When a replica receives a distributedvariation operation and has enough unconsumed local vari-ation limit to process it, the replica uses the same way asfor processing a variation operation from a client. Thenthe replica returns a notification message to replica i to re-port the commitment of the requested variation operation. Ifreplica i receives multiple commitments, it sends requestsfor an undo operation of the processed variation operationto replicas from which replica i did not first receive thecommitment. Then, replica i sends a request for the asyn-chronous propagation of the processed variation operationto the replica from which replica i first received the commit-ment.

When a positive (negative) replica receives variationoperations asynchronously propagated from other replicas,it changes the negative (positive) AVL and PCR in the sameway as for processing a positive (negative) variation oper-ation from a client without changing the positive (negative)PCR. Figure 4 shows how variation operations are processedusing the AVL, PVL, and PCR. In the example in this figure,there are three replicas: a, b, and c. Replicas a and b havenon-zero positive-PVLs and zero negative-PVLs. Replica chas zero positive-PVL and non-zero negative-PVL. Graphs(a), (b), and (c) in the figure show the changes in the positivePCRs of a, b, and the negative PCR of c, respectively.

The white and black triangles in the figure represent ar-rivals of positive and negative variation operations, respec-tively. The ith variation operation arriving at replica j isshown as o( j)

i in the figure. The dotted lines in those graphsrepresent 1, which is the upper bound of the PCR. When apositive variation operation arrives at replica a and its varia-tion is v(a)

1 , the PCR of replica a increases by v(a)1 /A

(a)l(a).When positive and negative variation operations arrive atreplicas b and c, their PCRs are modified in the same way.

After replica c processes the first negative variationoperation o(c)

1 whose variation is v(c)1 , it is asynchronously

transmitted to replicas a and b. In the figure, negative varia-tion operation o(c)

1 arrives at replicas a and b at times T1 andT2, respectively. In addition, the second negative variationoperation for replica c does not arrive at replicas a and bin the graphs. When replicas a and b receive o(c)

1 , they in-crease their AVLs by v(c)

1 . This increase in the AVL modifiestheir PCRs because their unconsumed positive local varia-tion limits increase by l(a)v(c)

1 . Then, their PCRs decrease asshown in Fig. 4.

Page 7: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA717

Fig. 5 Example of decentralized and asynchronous replica control.

Fig. 4 Processing of variation operations using AVL, PVL, and PCR.

2.8 Example of Variation Operation Processing

Figure 5 demonstrates the operation of the FVRC system inan example network. The four networks in the figure showstate transitions of the example network. They are first con-figured so that both their positive and negative AVLs are 27and their PVLs are 1/9. The value of the data object is setto 27 by a data initialization operation and the valid rangeis from 0 to 54. In Fig. 5 (a), all the replicas are reachablefrom each other. Client C1 first sends a negative variationoperation of 3 to replica 9 (E1). Because it has the negativeunconsumed local variation limit of 3, the negative variationoperation is processed and its negative unconsumed localvariation limit is fully used. As a result, replica 9 increases

the positive AVL to 30. Then, the negative variation opera-tion is asynchronously propagated to replica 7 (E2). Replica7 also increases its positive AVL to 30. In Fig. 5 (b), the net-work is partitioned. The variation operation initially sent byC1 is propagated to replica 6 (E3) and it increases its pos-itive AVL to 30. Client C2 then sends a positive variationoperation of 3 to replica 1 (E4). It is asynchronously propa-gated to replicas 5, 3, and 2 (E5, E6, and E7). Replicas 1, 5,3, and 2 increase their negative AVLs to 30. In Fig. 5 (c), thenetwork is furthermore partitioned. Client C3 then sends apositive variation operation of 3 to replica 8 (E8). It is asyn-chronously propagated to replica 4 (E9). Replicas 8 and 4increase their negative AVLs to 30. Finally, the network par-titioning is partially recovered as shown in Fig. 5 (d). Thevariation operations originally issued by clients C1 and C2

are propagated to replicas 8 and 4 (E10 and E11) and 7, 9,and 6 (E12, E13, and E14), respectively. Replicas 4, 6, 7, 8,and 9 increase both their positive and negative AVLs to 30.

Here, we consider the global variation limit by assum-ing that any information of all the replicas is available. InFigs. 5 (a), (b), (c), and (d), the pairs of the positive and neg-ative global variation limits are (30,24), (27, 27), (24, 30),and (24, 30), respectively, where (Gp,Gn) means that Gp andGn are the positive and negative global variation limits, re-spectively. If replicas can process variation operations thatexceed the global variation limit, the consistency of data willbe broken. Table 1 shows the positive and negative uncon-sumed local variation limits in all the replicas in Figs. 5 (a),

Page 8: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

718IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

Table 1 Change in unconsumed local variation limits of replicas in theexample.

(a) (b) (c) (d)Replica pos. neg. pos. neg. pos. neg. pos. neg.

1 3 3 0 10/3 0 10/3 0 10/32 3 3 3 10/3 3 10/3 3 10/33 3 3 3 10/3 3 10/3 3 10/34 3 3 3 3 3 10/3 10/3 10/35 3 3 3 10/3 3 10/3 3 10/36 3 3 10/3 3 10/3 3 10/3 10/37 10/3 3 10/3 3 10/3 3 10/3 10/38 3 3 3 3 0 10/3 1/3 10/39 10/3 0 10/3 0 10/3 0 10/3 1/3

Total 83/3 24 25 76/3 22 26 68/3 27

(b), (c), and (d). In every state, the total unconsumed lo-cal variation limits never exceed the global variation limitsdescribed above. Therefore, in any dynamic transient state,our method maintains the consistency of data.

2.9 Fair PVL Reassignment

As described in Sec. 1, a decentralized and asynchronousway is needed for controlling a number of loosely coupledsystems. Because a number of loosely coupled replicas areoccasionally partitioned and the set of replicas in operationdynamically change, replicas participating in a calculationfor replica control may become unreachable before its com-pletion. Therefore, a replica control method must maintainthe consistency of data even if it cannot be completed andstops in a transient state. This requirement is needed forthe PVL reassignment as well as the calculation of the un-consumed local variation limit described in Sec. 2.6. In ad-dition, because it is generally difficult to detect the termina-tion of a distributed algorithm, the distributed PVL reassign-ment method moreover needs to satisfy this requirement. Toachieve it, we must minimize the scope of procedures in thePVL reassignment that can cause the inconsistency of data ifthey stop in a transient state. The scope means the length ofmessage sequence and the number of replicas participatingin a procedure in the PVL reassignment. In this section, wedefine the PVL reassignment method so that its distributedversion satisfies the above requirement, which is describedin Sec. 2.10.

For replicas to process variation operations, they needsufficient PVLs. The FVRC system fairly reassigns thePVLs of replicas periodically. When a replica tries tochange its PVL from the kth

b to ktha levels, it specifies the dif-

ference between ka and kb. We call this difference the PVLalteration. For the FVRC system, we define the fairness ofassigning PVLs as follows. The fair state is the case suchthat the rate at which replicas successfully obtain a PVL al-teration is the same when replicas demand the same seriesof PVL alteration.

To achieve this fairness, the FVRC system reassignsPVL as follows. Let a(i) be a PVL alteration requested byreplica i. The FVRC method modifies PVL l(i) to the level

kb + a(i) − M (4)

M = �∑

j

a( j)/n� or �∑

j

a( j)/n�, (5)

where n, �x�, and �x� are the number of replicas, the largestinteger not greater than x, and the smallest integer notsmaller than x, respectively. In addition, the total sum ofM in all replicas is equal to

∑j a( j). This reassignment

causes all the replicas to have the same insufficient or ex-cessive PVL. After the reassignment of the PVL, the PCR ischanged to

l(i)

l(i)r(i), (6)

where l(i) is the reassigned PVL of replica i.The reassignment of PVLs must satisfy two require-

ments. The first is that a reassigned PVL must remain in therange between d1 and dm, where d1 and dm are the minimumand maximum levels of the PVL, respectively. The secondis that a PCR of a replica remains in the range between 0and 1.

Because kb + a(i) ≤ dm, Formula (4) is equal to or lessthan dm. In addition, from Formula (6), the PCR is never lessthan zero. Therefore, a replica must specify a PVL alterationrequest so that a reassigned PVL is equal to or more than thePVL that makes Formula (6) equal to 1. To accomplish this,we introduce the upper bound of a(i), or bmax.

Hence, a replica must issue an alteration request satis-fying

kb + a(i) − M ≥ kb + a(i) − bmax ≥ x, (7)

where x is the minimum PVL so that the PCR after PVLreassignment does not exceed 1. In addition, because dm −bmax ≥ x from inequality (7), a replica must not use thelocal variation limit from dm − bmax to dm. We consider thatthe global variation limit is distributed among replicas andthere are no replicas with most of the AVL in real cases.Therefore, this might not decrease the applicability of oursystem.

2.10 Distributed Fair PVL Reassignment

In our system, every replica processes update transactionsbased on a PVL reassigned by the distributed algorithm.PVLs are periodically reassigned among replicas. PVLsused in period i are calculated for a time slot within periodi − 1 by the distributed algorithm as shown in Fig. 6. Thegap between the end of the time slot in period i − 1 and thebeginning of the next period i is sufficiently greater than themaximum time offset that can be achieved by a time syn-chronization method used in the system.

The FVRC system achieves fairness among all repli-cas by averaging alteration requests as represented in For-mula (5). Replicas usually process different numbers of vari-ation operations. If an FVRC system has replicas that pro-cess very different numbers of operations simultaneously,then the replicas processing a small number of variation op-erations have a lower probability of obtaining sufficient PVL

Page 9: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA719

Fig. 6 Execution timing of distributed fair PVL reassignment.

than replicas processing more operations. Hence, we intro-duce a virtual replica process. A replica contains one ormore virtual replica processes. In proportion to the numberof virtual replica processes that a replica has, it can processmore variation operations. The following describes the op-eration of the virtual replica process.

Let g(i) be a variable for calculating M on replica ishown in Formula (5). It is initially a(i). We call g(i) an ar-bitrated request. In addition, we call the difference betweenan arbitrated request of a neighboring replica and its own anarbitrated request difference. We use three types of mes-sages for the distributed algorithm: PVL notification, PVLexchange request, and PVL exchange acknowledgment. ThePVL notification message contains the arbitrated request ofa replica that issued the message. This message is not for-warded by any replicas. The PVL exchange request messagecontains the identifier of the source replica that originallyissued the message, its arbitrated request, and that of thedestination replica of the message. The PVL exchange ac-knowledgment message contains the identifier of the sourcereplica of the PVL exchange request message in conjunc-tion with the acknowledgment message and its arbitrated re-quest. Figure 7 shows the distributed fairness control algo-rithm. The following describes it.

1. Replica i sends PVL notification messages includingg(i) to all neighboring replicas.

2. When replica i receives PVL notification messagesfrom all neighboring replicas, it determines the max-imum arbitrated request difference.

3. If the maximum arbitrated request difference of replicai occurs in the connection between i and j, and g( j)−g(i)

is greater than 1, replica i sends a PVL exchange re-quest message to replica j. The request message in-cludes g(i), g( j), and i. If the maximum arbitrated re-quest difference is equal to 1, the PVL exchange re-quest messages are delivered to all replicas whose ar-bitrated requests are g(i) + 1.

4. When replica i receives a PVL exchange request mes-sage from replica j, replica i determines if the next hopfor the source replica of the message is j in order thatreplica i never receives the same message from morethan one neighboring replica. When this is true, g(i) inthe message is equal to g(i) of replica i, and g(i) − g(o) is

N(i): the set of neighboring replicas.I(i): the set of replicas from which a notification message arrives.init: · · · (1)∀ j ∈ N(i): send(“noti f y′′, g(i))i, j

receive(“exchange′′, g(o), g(i), o) j,i: · · · (2)if j = nexthop(o) then

if g(i) = g(i) ∧ g(i) − g(o) > 1 theng(i) := g(i) − 1send(“acknowledge′′ , g(o), o)i, j∀m ∈ N(i): send(“noti f y′′, g(i))i,m

else if g(i) = g(i) ∧ g(i) − g(o) = 1 thenu := maxm∈N(i) (g(m) − g(i))if u = 0 then∀k : u = g(k) − g(i) : send(“exchange′′ , g(o), g(k), o)i,k

else if u = 1 then∃k : u = g(k) − g(i) : send(“exchange′′ , g(o), g(k), o)i,k

endifendif

endif

receive(“noti f y′′ , g( j)) j,i: · · · (3)g( j) := g( j)

I(i) = I(i) ∪ { j}if I(i) ≡ N(i) then

u := max j∈N(i) (g( j) − g(i))if u > 1 then∃k : g(k) − g(i) = u : send(“exchange′′ , g(i), g(k), i)i,k

else if u = 1 then∀k : g(k) − g(i) = u : send(“exchange′′, g(i), g(k), i)i,k

endifendif

receive(“acknowledge′′, g(o), o) j,i: · · · (4)if o = i then

g(i) := g(i) + 1∀m ∈ N(i): send(“noti f y′′, g(i))i,m

elsem = nexthop(o)if g(m) − g(o) ≥ 1 then

send(“acknowledge′′ , g(o), o)i,mendif

endif

Fig. 7 Distributed algorithm for fair PVL reassignment.

more than 1, replica i decrements g(i) by 1 and sendsreplica j a PVL exchange acknowledgment messagefor the request message, where g(o) is the arbitratedrequest of the source replica of the request message.When g(i) − g(o) is equal to 1 and the maximum arbi-trated request difference of replica i is 0, replica i for-wards the request message to all neighboring replicaswhose arbitrated requests are g(i). When g(i) − g(o) isequal to 1 and the maximum arbitrated request differ-ence of replica i is 1, replica i forwards the request mes-sage to a neighboring replica whose arbitrated requestsare g(i) + 1.

5. When replica i receives the acknowledgment messageand it is the source replica of the request message inconjunction with the acknowledgment message, it in-crements g(i) by 1. Otherwise, it forwards the acknowl-edgment message toward the source replica when thenext hop for it has an arbitrated request that is greaterthan that of the source replica. During the forwardingprocess of the acknowledgment message, a replica for-warding it temporarily increments its arbitrated request

Page 10: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

720IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

by 1.

Theorem 1: This algorithm converges, and the maximumarbitrated request difference between any two replicas in anetwork is at most one.

Proof: The procedure in our algorithm by which two repli-cas exchange their arbitrated requests through a PVL ex-change request message is called a PVL exchange proce-dure. In this procedure, replicas i and j increment g(i) anddecrement g( j) by 1, respectively. First, let us consider thePVL exchange procedure performed between neighboringreplicas. When a replica with the maximum arbitrated re-quest value in a network has neighboring replicas with ar-bitrated requests that are the maximum arbitrated requestvalue minus 2 or less, then the replica with the maximumvalue receives a PVL exchange request message. Becausethe PVL exchange procedure is performed between replicaswhose arbitrated requests differ by 2 or more, the PVL ex-change procedure never causes an increase in the number ofreplicas with the maximum arbitrated request value. There-fore, the number of replicas with the maximum arbitratedrequest value decreases. For the same reason, the numberof replicas with the minimum arbitrated request value de-creases by iterations of the PVL exchange procedure. Af-ter iterations of the PVL exchange procedure, the differencein the arbitrated request value of a replica with those of itsneighboring replicas is within ±1.

Next, let us consider the PVL exchange procedure per-formed between non-neighboring replicas. A PVL exchangerequest message from replica i is forwarded along pathswhere any replica j has g( j) = g(i) + 1. Then, it reachesthe first replica k whose g(k) is equal to or more than g(i) + 2and a PVL exchange procedure is executed. Iterations ofthis procedure decrease the number of replicas that have theminimum or maximum arbitrated request value until the dif-ference between the maximum and minimum arbitrated re-quest value in the network is 1 at most. Then, our algo-rithm converges. In addition, a PVL exchange procedurepreserves the total sum of arbitrated requests for all repli-cas. This means that the arbitrated requests of any replicasare finally

�∑

i

a(i)/n� or �∑

i

a(i)/n�. (8)

�The distributed PVL reassignment is composed of PVL

exchange procedures. In the PVL exchange procedure, areplica always provides or receives one level of the PVLwith a neighboring replica. Only the change in the levelof the PVL can cause data inconsistency. Therefore, ourmethod minimizes the scope of replica procedures that cancause data inconsistency if they stop in a transient state. Inaddition, the PVLs for any replicas are never lower than xin any transient state as seen in inequality (7) because thePVL exchange procedure never increases the maximum andnever decreases the minimum arbitrated request value in anetwork. This means that all transient states become consis-tent by atomically executing the minimized scope that can

Fig. 8 Example network demonstrating the operation of distributed fairPVL reassignment.

Fig. 9 Operation of distributed fair PVL reassignment in the examplenetwork.

cause data inconsistency. Two replicas can negotiate the ex-change of the PVL using a three-way handshake [21].

2.11 Example of Distributed Fair PVL Reassignment

Figure 8 shows an example network to demonstrate the op-eration of the distributed fair PVL reassignment. Figure 9demonstrates the operation in the example network. Thisoperation is the result of the implemented program showingthe change in arbitrated requests for every replica.

For replica 2, the maximum arbitrated request differ-ence occurs in connection with replica 4 at time T1 becausethe arbitrated request differences with replicas 1, 3, and 4are −2, 2, and 4, respectively, so replica 2 performs a PVLexchange procedure with replica 4. In the same way, themaximum arbitrated request difference for replicas 1, 5, 3,1, 2, 3, 2, and 1 occurs in connection with replicas 5, 6, 6,5, 4, 6, 3, and 5 at times T2, T3, T4, T5, T6, T7, T8, and T9,respectively. For replica 4, the maximum arbitrated requestdifference occurs in connection with replicas 1 and 2 at timeT10 because arbitrated request differences with replicas 1,2, and 3 are −1, 1, and 1, respectively, so replica 4 sendsPVL exchange request messages to replicas 2 and 3. Then,the message sent to replica 3 is forwarded to replica 6, andfinally replica 4 performs a PVL exchange procedure withreplica 6. For replica 1, the maximum arbitrated request dif-ference occurs in connection with replica 5 at time T11 be-cause arbitrated request differences with replicas 2, 4, and5 are 2, 2, and 3, respectively, so replica 1 performs a PVLexchange procedure with replica 5. For replica 1, the maxi-mum arbitrated request difference occurs in connection withreplicas 2, 4, and 5 at time T12 because arbitrated request

Page 11: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA721

differences with replicas 2, 4, and 5 are 1, 1, and 1, respec-tively, so replica 1 sends PVL exchange request messages toreplicas 2, 4, and 5. Then, the message sent to replica 5 isforwarded to replica 6, and finally replica 1 performs a PVLexchange procedure with replica 6.

3. Evaluation

The function of our method can be divided into two sub-functions. One is the transaction processing using the node-dependent values, which are AVL, PVL, and PCR. The otheris the reassignment of the PVL. To clarify the effectivenessof these sub-functions, we evaluated them separately. InSecs. 3.1 and 3.2, we discuss evaluated results in terms ofthe performance of transaction processing using the node-dependent values and the effectiveness of PVL reassign-ment, respectively.

3.1 Transaction Processing Using Node-Dependent Val-ues

To evaluate the performance of the transaction processingof our method by simulation, we used a square grid topol-ogy of 10 × 10 replica nodes. Each of the replica nodes andlinks failed or was administratively shutdown with probabil-ity of 0.1. The states of replica nodes and links changed ata regular interval Tc. A client did not send a request for avariation operation to unavailable replicas. When a variationoperation is propagated among replicas, there are two typesof strategies: immediate and deferred [22]. In the imme-diate strategy, an update transaction is propagated withoutbeing aggregated by replicas. In the deferred strategy, up-date transactions are aggregated before being sent to otherreplicas. We denote the times for the propagation of vari-ation operations among replicas in the immediate and de-ferred strategies by Tp and Td, respectively. In this evalu-ation, we used Tp as a unit of a time and assumed that Td

was 10Tp. Each replica received positive and negative vari-ation operations whose total change for time period Tp hada continuous uniform distribution on the interval [0, 2].

When currently proposed lazy-group replication meth-ods are used, they cannot eliminate the asynchronous abortof already committed transactions. To clarify the effec-tiveness of our method, we compared it with a replicationmethod that we call lazy-group DVP. In lazy-group DVP,the value of data is split up and assigned to replicas. Each ofthem processes transactions using the assigned value. Theassigned value of data is recalculated among reachable repli-cas at a regular interval Tr. The performance of transac-tion processing by lazy-group DVP depends on Tr and atime period Ts that it takes for replicas to recalculate theassigned value of data in lazy-group DVP. For this recalcu-lation, replicas need to take three steps. First, they need tonegotiate about the beginning of the recalculation. Second,they need to calculate the range of the current value of databy exchanging variation operations that have already beenprocessed. Third, they need to split up the value of data and

Fig. 10 Change in success probability of variation operations dependingon recalculation period of lazy-group DVP in immediate strategy.

Fig. 11 Change in success probability of variation operations dependingon recalculation period of lazy-group DVP in deferred strategy.

then negotiate about the split value. Because it takes Tp forreplicas to negotiate variation operations among themselvesas described above, we assume that it takes three times Tp toexecute the recalculation of the assigned value for replicasin lazy-group DVP.

Figure 10 shows the change in the probability of varia-tion operations being processed successfully in the immedi-ate strategy when Tr was changed from Tp to 10Tp. In thisevaluation, the positive and negative global variation lim-its were initially 300, which is equal to the total change invariation operations processed by all the replicas for 3Tp onaverage. The very small change in the success probability ofour method was caused by the difference in variation oper-ations probabilistically generated in every simulation. Ourmethod achieved higher success probability than lazy-groupDVP in any Tr. In this figure, the success probability oflazy-group DVP is greatest when Tr is 6Tp

Figure 11 shows the change in the success probabil-ity of processed variation operations in the deferred strategywhen Tr was changed from 10Tp to 100Tp. The range ofTr in the deferred strategy is different than that in the im-mediate one because the deferred strategy is generally usedto decrease the overhead of processing refresh transactions,

Page 12: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

722IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

Fig. 12 Change in success probability of variation operations dependingon state change period of replicas and links.

and the recalculation of assigned value for replicas requireshigher overhead than the processing of refresh transactions.In this evaluation, the positive and negative global variationlimits were initially 1500, which is equal to the total changein variation operations processed by all the replicas for 15Tp

on average. Our method achieved higher success probabilitythan lazy-group DVP in any Tr. In this figure, the successprobability of lazy-group DVP is greatest when Tr is 10Tp.The maximum success probability of lazy-group DVP in thedeferred strategy is greater than that in the immediate onebecause the initial value of data in the deferred strategy ismuch greater than that in the immediate one, which enablesreplicas to process updates without interacting with othersfor a longer time.

Figure 12 shows the success probabilities of ourmethod and lazy-group DVP when Tc was changed from1 to 100 and Tr was fixed to 6Tp and 10Tp in the immedi-ate and deferred strategies, respectively. In the immediatestrategy, the success probability of our method in the wholerange of Tr was better than that of lazy-group DVP. In thedeferred strategy, the success probability of our method wasbetter than that of lazy-group DVP when Tc was less thanabout 60Tp. In both strategies, the longer Tc, the smaller thesuccess probability of our method, while that of lazy groupDVP was almost independent of Tc. In our method, the in-crease in the AVL caused by a variation operation was acti-vated after it had arrived at all replicas. When Tc was long,variation operations tended to stay in a partitioned networkfor a long time. Therefore, because part of the AVL had notbeen activated for a long time, the success probability of ourmethod decreased when Tc was long. In this evaluation, allreplicas received positive and negative variation operationsgenerated based on the same probability distribution func-tion. However, if variation operations received by replicasvary and Tc is long, the success probability of lazy-groupDVP also decreases. This is because when most positivevariation operations are performed in a network partitionand most negative ones are performed in another, replicascannot continue to process variation operations due to thelack of the global variation limit. Hence, the network of

replica nodes should be designed so that particular replicasare not separated from the others for a long time. Whena network of replicas satisfies this requirement, the successprobability of our method is much better than that of lazy-group DVP.

3.2 PVL Reassignment

When we apply the FVRC system for practical use, an or-ganization managing a replica must perform the followingtwo steps. (1) It estimates the numbers (e.g., products) han-dled in the FVRC system that are necessary for its businessin the next period using past business results and business-specific knowledge and (2) it issues a PVL alteration re-quest for the difference between the estimated and currentnumbers. To achieve step (1), we need the characteristicsof request arrivals, which strongly depend on the type ofbusiness. Therefore, we excluded step (1) from the scopeof this evaluation. In this evaluation, we assumed for sim-plicity that requests arrive according to the Poisson processand those organizations managing replicas can estimate theprecise number needed for their business in the next periodusing the estimation of step (1).

We evaluated the FVRC system in terms of perfor-mance for successfully processing variation operations andfairness in obtaining the PVL among replicas. We assumedthat the number of variation operations arriving at replicasfollows the Poisson process. Thus, the probability that noperations arrive in time period T P(n, T) is

P(n, T ) = e−λT(λT )n

n!, (9)

where λ is the mean value of the number of arrivals per unittime.

First, we evaluated the performance of the whole repli-cated system by simulation. In this simulation, there werem replicas in the system. They received negative variationoperations with the same variation and processed them us-ing only their own unconsumed local variation limits. Thearrival of negative variation operations followed the Poissonprocess. The mean number of arrivals in period T was λT .In the initial state of the simulation, the negative global vari-ation limit was equal to the variation needed to process mλTnegative variation operations. In the first period, the nega-tive global variation limit was distributed in proportion tothe number of variation operations estimated to arrive. Theestimated number of operations certainly arrived at replicasin the following period. Each replica performed the estima-tion of step (1). At the beginning of the second period, apositive variation operation arrived and was distributed toall replicas. The variation of the operation was the variationneeded to process mλT negative variation operations, whichis the same as in the first period.

From the second period, a replica issued a PVL alter-ation request as the difference between the estimated num-ber of variation operations for the next and current peri-ods. Those alteration requests were arbitrated among repli-

Page 13: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA723

Fig. 13 Probability that variation operations are successfully processedin FVRC system.

cas and then assigned as PVLs of replicas that might differfrom those needed for the estimated number. If the assignedPVL was insufficient, the probability that variation opera-tions were successfully processed was less than 1. Other-wise, the probability was 1. The simulation iterated 100periods in the same way. In addition, we performed thissimulation 100 times.

Figure 13 shows the mean probability that variationoperations were successfully processed in this simulation.When λT = 10, 100, and 1000, they are almost constant andapproximately 9.09× 10−1, 9.90× 10−1, and 9.98× 10−1, re-spectively. The probabilities are almost independent of thenumber of replicas. However, the more replicas there were,the higher the probability was. Generally speaking, the ratioof the standard deviation to the mean value for the numberof arrivals in the Poisson process decreases as λT increases.Hence, the probability is lowest when λT = 10.

This result is exactly the same as the probability in theideal situation in which the negative global variation limit isassigned to one replica and all negative variation operationsarrive at it. This result in the figure shows that the fair PVLreassignment did not decrease the probability that variationoperations were successfully processed. The system couldfully process the maximum number of requests when usingour method.

Next, we evaluated the fairness among individual repli-cas. We defined the fairness, called fairness rate, as the stan-dard deviation of the probabilities that replicas can obtain arequested PVL alteration divided by the mean value of theprobabilities. Figure 14 shows the fairness rate versus thenumber of replicas. The fairness rate means the difference indisadvantages among replicas caused by the PVL reassign-ment. The probability used in the definition of the fairnessrate is defined as follows. Assume that replica i requests aPVL alteration of a(i) and the fairly assigned PVL alterationis a(i). If a(i) is equal to or greater than a(i), the probability is1. If a(i) > a(i), the probability is a(i)/a(i).

We calculated the fairness rate through the same sim-ulation as the performance evaluation described above.

Fig. 14 Fairness rate versus the number of replicas. Fairness rate is de-fined as the standard deviation of the probabilities that replicas can obtaina requested PVL alteration divided by the mean value of the probabilities.

Therefore, variation operations arrived at replicas accordingto the Poisson process. The fairness rates in λT = 10, 100,and 1000 are shown in Fig. 14. The original fairness rateswhen the PVL alteration requests were issued by replicas inλT = 10, 100, and 1000 were 3.16 × 10−1, 1.00 × 10−1, and3.16 × 10−2, respectively. They converged to the values asshown in Fig. 14. Even when the number of replicas was10, the fairness rate was less than 2.58 × 10−2. For any λT ,the fairness rate was also less than 2.58 × 10−2. When thenumber of replicas was more than 100, the fairness rate wasapproximately 9×10−3 for any λT . Generally speaking, it isdifficult to achieve high fairness when the number of repli-cas and λT are small. However, our method did achieve anextremely high fairness rate even when the number of repli-cas and λT was small.

4. Related Work

Epsilon serializability [4]–[7] is a generalization of serializ-ability [1]. It brings more concurrency by allowing a limitedamount of inconsistency in transaction processing. To con-tinue to process transactions even when a link or node fail-ure occurs, systems using epsilon serializability need morereliability than the FVRC system. The FVRC system mightrestrict transaction semantics, but it is suitable for many ap-plications used in the Internet because it enables weak cou-pling among replicas and allows unreliability of links andnodes.

The Escrow transactional method [3], [23] is designedto offer non-blocking record updating by transactions thatare long-lived. It provides high concurrency for transac-tion processing and allows distributed transactions in thepresence of delayed messages and occasional line discon-nection. An Escrow field allows fuzziness of the value ofdata. It improves the concurrency of transaction processing.However, this method does not simultaneously process up-date transactions in multiple partitioned networks. The ideaof improving concurrency by allowing fuzziness is used in

Page 14: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

724IEICE TRANS. INF. & SYST., VOL.E88–D, NO.4 APRIL 2005

epsilon serializability as well as in the Escrow transactionalmethod.

Data-Value Partitioning (DVP) [15] splits up the val-ues of data and stores them as tokens on servers process-ing transactions. Cetintemel et al. proposed token re-distribution strategies for DVP and evaluated them usingreal wide-area message tracing [24]. They split the num-ber into smaller portions, which were mainly to be con-sumed in only one direction, though they allowed the con-sumed number to be returned to a server. DVP can eliminatethe abort of already committed transactions. On the otherhand, our method achieves decentralized and asynchronousreplica control to eliminate the abort of already committedtransactions. It achieves high performance of transactionprocessing by eliminating synchronous interaction amongreplicas. In addition, the FVRC method provides symmet-ric operations in the positive and negative directions on nu-merical data and its premise is that numerical data is sharedand scrambled by different organizations and companies fortheir business, which usually compete with each other. Oursystem enables fairness when those organizations and com-panies try to obtain a portion of the numerical data.

In our system, a replica provides the value of data usingits copy and unconsumed local variation limits. As a result,the value of data read by a client includes some fuzzinessbut has the lower and upper bounds of the data. Yu and Vah-dat proposed a technique for efficient error bounding [25].A replica in our system provides the lower and upper boundof data by assuming that all local variation limits assignedto the other replicas are consumed. To reduce the error in-cluded in data handled by our system, we need tighter cou-pling among replicas. However, this is less suitable for useon the Internet.

5. Conclusion

We proposed fairly assigned variation-based replica con-trol for a number of loosely coupled systems. This methodachieves two important features in a decentralized and asyn-chronous way. First, it eliminates the asynchronous abortof already committed transactions even when network par-titioning occurs. Second, it balances the demands of repli-cas to process update transactions so that successfully pro-cessed transactions have an equal effect on all organizationsmanaging replicas. A decentralized and asynchronous ap-proach is needed because it is difficult to keep a number ofloosely coupled systems in working order, and replica oper-ations performed in a centralized and synchronous way candegrade the performance of transaction processing.

We evaluated our method in terms of its transactionprocessing performance and fairness among replicas. Theresults showed that our system can achieve extremely highperformance and fairness among replicas.

References

[1] P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Con-trol and Recovery in Database Systems, Addison-Wesley, 1987.

[2] A. Helal, A. Heddaya, and B. Bhargava, Replication Techniques inDistributed Systems, Kluwer Academic Publishers, 1996.

[3] J. Gray and A. Reuter, Transaction Processing: Concepts and Tech-niques, Morgan Kaufmann, 1993.

[4] C. Pu and A. Leff, “Autonomous transaction execution with epsilonserializability,” Proc. RIDE Workshop on Transaction and QueryProcessing, pp.2–11, 1992.

[5] C. Pu, W. Hseush, G.E. Kaiser, K. Wu, and P.S. Yu, “Distributeddivergence control for epsilon serializability,” Proc. IEEE Interna-tional Conference on Distributed Computing Systems, pp.449–456,1993.

[6] K. Ramamritham and C. Pu, “A formal characterization of epsilonserializability,” IEEE Trans. Knowl. Data Eng., vol.7, no.6, pp.997–1007, 1995.

[7] K. Wu, P.S. Yu, and C. Pu, “Divergence control algorithm for epsilonserializability,” IEEE Trans. Knowl. Data Eng., vol.9, no.2, pp.262–274, 1997.

[8] R.A. Golding, “Weak consistency group communication for wide-area systems,” Proc. Second Workshop on the Management of Repli-cated Data, pp.13–16, 1992.

[9] A. Birrell, R. Levin, R.M. Needham, and M.D. Shroeder,“Grapevine: An exercise in distributed computing,” Commun.ACM, vol.25, no.4, pp.260–274, 1982.

[10] T. Yamashita and S. Ono, “View divergence control of replicateddata using update delay estimation,” Proc. 18th IEEE Symposiumon Reliable Distributed Systems, pp.102–111, 1999.

[11] P. Cox and B.D. Noble, “Fast reconciliations in fluid replication,”Proc. International Conference on Distributed Computing Systems,pp.449–458, 2001.

[12] J.J. Fischer and A. Michael, “Sacrificing serializability to attain highavailability of data in an unreliable network,” Proc. 1st ACM Sym-posium on Principles of Database Systems, pp.70–75, 1982.

[13] D.B. Terry, M.M. Theimer, K. Petersen, and A.J. Demers, “Manag-ing update conflicts in a weakly connected replicated storage sys-tem,” Proc. Symposium on Operating Systems Principles, pp.172–183, 1995.

[14] J. Gray, P. Hell, and, P. O’Neil, and D. Shasha, “The dangers ofreplication and a solution,” Proc. ACM SIGMOD International Con-ference on Management of Data, pp.173–182, 1996.

[15] N. Soparkar and A. Silberschatz, “Data-value partitioning and vir-tual messages,” Proc. Symposium on Principles of Database Sys-tems, pp.357–367, 1990.

[16] D.L. Mills, “Precision synchronization of computer networkclocks,” Comput. Commun. Rev., vol.24, no.2, pp.28–43, 1994.

[17] J. Levine, “An algorithm to synchronize the time of a computer touniversal time,” IEEE/ACM Trans. Netw., vol.3, no.1, pp.42–50,1995.

[18] T. Yamashita and S. Ono, “A statistical method for time synchroniza-tion of computer clocks with precisely frequency-synchronized os-cillators,” Proc. IEEE 18th International Conference on DistributedComputing Systems, pp.32–39, 1998.

[19] D. Bersekas and R.G. Gallager, Data Networks, Prentice-Hall, 1987.[20] D. Du and D.F. Hsu, Combinatorial Network Theory, Kluwer Aca-

demic Publishers, 1996.[21] A.S. Tanenbaum, Computer Networks, Prentice-Hall, 1996.[22] K.P. Birman, Building Secure and Reliable Network Applications,

Manning Publications, 1996.[23] P.E. O’Neil, “The Escrow transactional method,” ACM Trans.

Database Syst., vol.11, no.4, pp.405–430, 1986.[24] U. Cetintemel, B. Ozden, M.J. Franklin, and V. Silberschatz, “De-

sign and evaluation of redistribution strategies for wide-area com-modity distribution,” Proc. IEEE International Conference on Dis-tributed Computing Systems, pp.154–161, 2001.

[25] H. Yu and A. Vahdat “Efficient numerical error bounding for repli-cated network services,” Proc. International Conference on VeryLarge Databases, pp.123–133, 2000.

Page 15: Dynamic Replica Control Based on Fairly Assigned Variation ...€¦ · When a replica receives a read operation from a client, the replica returns the current value of its copy and

YAMASHITA: DYNAMIC REPLICA CONTROL BASED ON FAIRLY ASSIGNED VARIATION OF DATA725

Takao Yamashita received the B.S. andM.S. degrees in electronics engineering fromKyoto University in 1990 and 1992, respec-tively. In 1992, he joined Nippon Telegraph andTelephone Corporation. His current researchinterests encompass distributed algorithms andloosely-coupled distributed systems. He is amember of the IEEE, IPSJ, and APS.