Consistency and Replication
CSCI 4780/6780
Chapter Outline• Why replication?
– Relations to reliability and scalability
• How to maintain consistency of replicated data?– Consistency models– Consistency schemes– How to distributed updates and when to distribute
them
• Examples– Parallel programming– WWW-based systems
Reasons for Replication
• Two primary reasons– Improving reliability of system– Improving scalability and performance of system
• Reliability– Resilience to failures– Protection against data corruption: Byzantine failures and
quorum-based systems
Scalability and Performance
• Scaling in numbers– Replication can help to scale the distributed system by
numbers– If number of processes accessing data increases, it helps
to replicate the data– Example: Parallel programs
• Geographical scaling– Placing replica close to process using the data, improves
the performance– Example: Edge cache networks, browser caches, etc.
Problems of Replication
• Creating and maintaining replicas is not free
• Multiple copies leads to consistency problems– What happens when one of the replicas gets modified?– Modifications have to be carried out at all replicas– How and when determines the cost of replication
• WWW-based systems– Browser and client side caches– May lead to stale pages– TTL model, Update/Invalidate model
Replication as Scalability Technique• Replication can help to solve geographical
scalability problems– Placing replicas closer to clients
• Maintaining replicas consistent may place sever overheads– Examples: N accesses and M updates per unit time and
N<<M
• Problems with multiple copies and tight consistency– Implementing global synchronization
• Relaxing consistency requirements is a possible solution
Data-Centric Consistency Models
The general organization of a logical data store, physically distributed and replicated across multiple processes.
Consistency Models• Contract between processes and data store
• If the processes play by certain rules the store promises to work correctly– Data store guarantees certain properties on the data items
stored– Example: A read on a data item would always return the
value showing the most recent write
• Several data consistency models– Strict consistency, sequential consistency, causal
consistency, FIFO consistency
Strict Consistency
• Most stringent consistency model
“Any read on a data item x returns a value corresponding to the result of the most recent write on x”
• Natural and obvious
• Uniprocessor systems guarantee strict consistency
• Implicitly assumes the existence of absolute global time
Strict Consistency - Illustration
Behavior of two processes, operating on the same data item.• A strictly consistent store.• A store that is not strictly consistent.
Problems with Strict Consistency• Strict consistency poses serious problems for systems with
multiple machines• Example
– Two machines A & B located in different continents & data item x is stored on B.
– A performs a read at T1 and immediately after B performs a write at T2
– If T2 – T1 is very small, the write should complete before T1 arrives. Else, T1 reads old value
• Problem arises because strict consistency relies on absolute global time– Impossible to assign unique time stamps corresponding to actual
global time– Locks do not solve the problem