6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang

6.4 Data And File Replication

Presenter : Jing He

Instructor: Dr. Yanqing Zhang

Outline

• Basic Knowledge

• Most Recent Projects

• Future Works

• References

Outline

• Basic Knowledge


• Future Works

• References

Why replicate

• Performance

• Reliability

• Resource sharing

• Network resource saving

Challenge

• Transparency– Parallelism transparency– Failure transparency– Replication transparency

• Concurrent Control

• Failure Recovery

Goal

• One-copy serializability: – The execution of transaction on replicated objects is

equivalent to the execution of the same transactions on non-replicated objects [1][R. Chow et al. 1997 ].

Architecture

• FSA , File service agent, client interface

• RM, replica manager, provide replication functions

• Client chooses one or more FSA to access data object.

• FSA acts as front end to replica managers RMs to provide replication transparency.

• FSA contacts one or more RMs for actual updating and reading of data objects.

Architecture

RM

RM

RM

RM

FSA

FSA

Client

Client

Read operations

• Read-one-primary: FSA only read from a primary RM to enforce consistency

• Read-one: FSA may read from any RM to gain concurrency

• Read-quorum: FSA must read from a quorum of RMs to decide the currency of data

Write Operations

• Write-one-primary: only write to primary RM, primary RM update all other RMs

• Write-all: update to all RMs

• Write-all- available: write to all functioning RMs. Faulty RM need to be synched before bring online.

Write Operations Cont.

• Write-quorum: update to a predefined quorum of RMs

• Write-gossip: update to any RM and lazily propagated to other RMs

Read-one-primary, write-one-primary

• Other RMs are backups of primary RM

• No concurrency

• Easy serialized

• Simple to implement

• Achieve one-copy serializability

• Primary RM is performance bottleneck

Read-one, Write-all

• Provides concurrency

• Concurrency control protocol needed to ensure consistency (serialization)

• Achieve one-copy serializability

• Difficult to implement (there will be failed TM to block any updates)

Read-one, Write-all-available

• Variation of Read one, Write all

• May not guarantee one-copy serializability

• Issue of lots conflict in transactions

Read-quorum,Write-quorum

• Version number attached to replicated object

• Highest version numbered object is the latest object in read.

• Write operation advances version by 1

• Write-write conflict: 2 * Write quorum > all object copies

• Read-write conflict: Write quorum + read quorum > all object copies

Gossip Update

• Updates are less frequent than reads ,updates can be propagated lazily to replicas.

• Both read and update operations are directed by FSA to any RM

• FSA shields replication details from clients.

• Increased performance

• Typical read one, write gossip

• Use timestamp

Basic Gossip Update

• Read: if TSfsa<=TSrm, RM has recent data, return it, otherwise wait for gossip, or try other RM

• Update: if Tsfsa>TSrm, update. Update TSrm send gossip. Otherwise, process based on application, perform update or reject

• Gossip: update RM if gossip carries new updates.

Causal Order Gossip Protocol

• Used for read-modify

• In a fixed RM configuration

• Using vector timestamps

• Using buffer to keep the order

Disadvantages of File replication

• Contents of the file needs to be known before replication operation takes place .

• Existing System cant work in limited bandwidth networks.

• DFS replication will not work well when there are large number of changes to replicate.

Outline

• Basic Knowledge


• Future Works

• References

Current Project• Data Grid File Replication [2][C. Yang, 2008]

• Create copies in convenient location• Replicas are adjusted to appropriate locations

using Bavesian Networks (BN)

• File replication in P2P systems• Plover: making replicas among physically close

nodes; load balance between replica nodes [3][H. Shen, 2009]

• EAD: efficient and adative decentralized file replication algorithm[4,5][H. Shen, 2009]

Outline

• Basic Knowledge


• Future Works

• References

Future Work

• Improve Efficiency and Effectiveness of file replication scheme

• Integrate File Replication and Consistency Maintenance

Outline

• Basic Knowledge


• Future Works

• References

Reference[1] R. Chow and T. Johnson, Distributed Operating Systems &

Algorithms, 1997

[2] C. Yang, C. Huang, and T. Hsiao, A Data Grid File Relication Maintenance Strategy Using Bayesian Networks, Eight International Conference on Intelligent Systems Design and Application, 2008

[3] H. Shen, and Y. Zhu, A proactive low-overhead file replication scheme for structured P2P content delivery network, Journal Parallel Distributed Computing, 2009

[4] H. Shen, IRM: Integrated File Replication and Consistency Maintenance in P2P Systems, IEEE Transactions on Parallel and Distributed Systems, 2009

[5] H. Shen, An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems, IEEE Transactions on Parallel and Distributed Systems, 2009

Documents

6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang