Download ppt - Consistency and Replication CSCI 4780/6780. Chapter Outline Why replication? –Relations to reliability and scalability How to maintain consistency of

Consistency and Replication

CSCI 4780/6780

Chapter Outline• Why replication?

– Relations to reliability and scalability

• How to maintain consistency of replicated data?– Consistency models– Consistency schemes– How to distributed updates and when to distribute

them

• Examples– Parallel programming– WWW-based systems

Reasons for Replication

• Two primary reasons– Improving reliability of system– Improving scalability and performance of system

• Reliability– Resilience to failures– Protection against data corruption: Byzantine failures and

quorum-based systems

Scalability and Performance

• Scaling in numbers– Replication can help to scale the distributed system by

numbers– If number of processes accessing data increases, it helps

to replicate the data– Example: Parallel programs

• Geographical scaling– Placing replica close to process using the data, improves

the performance– Example: Edge cache networks, browser caches, etc.

Problems of Replication

• Creating and maintaining replicas is not free

• Multiple copies leads to consistency problems– What happens when one of the replicas gets modified?– Modifications have to be carried out at all replicas– How and when determines the cost of replication

• WWW-based systems– Browser and client side caches– May lead to stale pages– TTL model, Update/Invalidate model

Replication as Scalability Technique• Replication can help to solve geographical

scalability problems– Placing replicas closer to clients

• Maintaining replicas consistent may place sever overheads– Examples: N accesses and M updates per unit time and

N<<M

• Problems with multiple copies and tight consistency– Implementing global synchronization

• Relaxing consistency requirements is a possible solution

Data-Centric Consistency Models

The general organization of a logical data store, physically distributed and replicated across multiple processes.

Consistency Models• Contract between processes and data store

• If the processes play by certain rules the store promises to work correctly– Data store guarantees certain properties on the data items

stored– Example: A read on a data item would always return the

value showing the most recent write

• Several data consistency models– Strict consistency, sequential consistency, causal

consistency, FIFO consistency

Strict Consistency

• Most stringent consistency model

“Any read on a data item x returns a value corresponding to the result of the most recent write on x”

• Natural and obvious

• Uniprocessor systems guarantee strict consistency

• Implicitly assumes the existence of absolute global time

Strict Consistency - Illustration

Behavior of two processes, operating on the same data item.• A strictly consistent store.• A store that is not strictly consistent.

Problems with Strict Consistency• Strict consistency poses serious problems for systems with

multiple machines• Example

– Two machines A & B located in different continents & data item x is stored on B.

– A performs a read at T1 and immediately after B performs a write at T2

– If T2 – T1 is very small, the write should complete before T1 arrives. Else, T1 reads old value

• Problem arises because strict consistency relies on absolute global time– Impossible to assign unique time stamps corresponding to actual

global time– Locks do not solve the problem