View
231
Download
2
Tags:
Embed Size (px)
Citation preview
On ReplicationOn Replication
July 2006July 2006Yin ChenYin Chen
• What is? Why need? Types?• Investigation of existing technologies
– IBM SQL replication– Sybase replication– Oracle replication– MySQL replication– Globus DRS– EGEE RMS– SRB
• Our project– Goals– Solutions– Features
OverviewOverview
• Copying of data & synchronization of updating
• Is not Cashing – Client phenomenon– Only for improving response time
• Is not a Backup (not automatically overwritten when the original data is modified )
• Is not a replicated system– deal with when/where to copy– Optimization (how much replica needed …)– Grow or shrink replication tree
What is replication?What is replication?
• Data consolidation (central audit & analyse)• Data distribution (for branch offices)• Performance
– Access efficiency (moving data near apps.)– Load balance (distributing access load)– Security (data protection)– Availability (off-line access)– Reliability (disaster recovery, avoiding single
point of failure)• Data Grid (to improve availability, response
time, fault tolerance)• Digital Library (copying digital doc, index … )
Why we need it?Why we need it?
• Synchronous Replication: What is: updating two storages at the same time; roll
back if one fails Benefits: High availability/auto fail-over/minimal data lossUsages: Disaster recoverDrawbacks: Network efficiency /scalability/cost/less
flexibility
• Asynchronous Replication: What is: changes are captured on the primary storage
and immediately / timely propagatedBenefits: low cost / scalability /flexibilityUsages: load balance/off-line access/access efficiencyDrawbacks: data lost / network bandwidth
Replication typesReplication types
Existing Existing technologiestechnologies
IBM Replication
WebSphere Information Integrator V8.2
Supports multivendors DB
Admin: create replication criteria control table
Capture: use log/trigger to capture the changes temp table
Apply: scheduled apply transactions accumulated target DB
Alert Monitor: monitor and notify users
Supports: after-image copy / before-image copy (can rollback)
Allows subset/simple view/ complex joins & unions copy
Asynchronous replication, allows specifying schedule
IBM Replication
Sybase Replication
Pioneer, Since 1993
“publish-and-subscribe” approach
Replication Agent: runs on each publisher, detects changes base on logs
Replication Server: apply changes to target DBs (use pre-configured intelligent routes)
Replication Server Manager: GUI-based, manage/monitor P2P env.
Stable Queues: temporary storage of data , ensure no data is lost
Is advanced in providing high performance
Sybase Replication
Oracle Replications
Multimaster Replication Materialized View Replication Multimaster Replication
P2P structure
Changes are pushed to every other site (synchronous/ asynchronous)
Conflicts may happen (Update conflict/Uniqueness conflict /Delete conflict )
Materialized View Replication
One master site manages several non-master sites (keep one/partial copy)
Updatable
Refresh (fast refresh/ complete refresh/ force refresh)
Hybrid Replication
Oracle Replications
MySQL Replications
1. simple master/slaver
3. dual masters
2. one slave two masters
4. dual master with slaves5. master ring 6. master ring with slaves
MySQL Replications
Basic replication services, using a light weight Master-Slave model
The master writes updates to logs; the slave reads and executes the queries from the master’s logs
the slave checks results on both sites, replication stops if query only succeeds on one site
This simple structure can be combined arbitrarily to build complex architectures
In a slow network, it is difficult for a slave to catch up with the master – improved in 4.0 by adding relay logs
Have to lock or restart the master for initial snapshot copy
Existing Existing technologiestechnologies
Globus DRS
Globus DRS
A client creates a request file (requested file name & target location) and sends to DRS
The Replicator checks user’s credential, and query RLI to find the LRC that contain mappings for the requested file
Also queries each remote LRC to get the physical file names, and selects a best one
Then starts RFT to transfer files.
Finally, registers the new replica to its LRC. The LRC will updates LRI to make replica visible
Existing Existing technologiestechnologies
EGEE RMS
Designed for large, read-only, file replicating among heterogeneous resources
Implement File Catalogues
Replica Location Service maps replica’s Grid Unique ID to physical location
Local Replica Catalogues provides information of replicas for a single VO
Replica Metadata Catalogue maps file’s logical name to Grid Unique ID
LCG File Catalogue is used for performance issues
EGEE RMS
Existing Existing technologiestechnologiesApplication
DISPATCHER: monitors input port and dispatches requests to handler
High Level Request Handler
MCAT
Remote SRB
Low Level Request Handler
File system driversUnitree HPSS UNIX
DBMS driversDB2 Oracle ObjectStore Illustra
SRB
Enables file searching by attributes
MCAT a database system storing metadata
one or more Master daemon processes having SRB Agent running on them
The dispatcher monitors incoming requests and pass to HLRH (can retrieve metadata from local/remote MCAT) or LLRH (can retrieve data from storage)
supports synch/asynch replication, MCAT replication
• Combining DB2 SQL Replication with OGSA-DAI technologies
• Grid-enabling DB2 Replication to provide a grid service interface for managing replication.
• Supporting more scalable, secure, high performance data access
• Extend OGSA-DAI to provide more powerful capabilities.
• Explore metadata technologies
Our GoalsOur Goals
System architectureSystem architecture
Metadata Catalogue
Relational Database Replication Mechanism
ReplicationControl Service
GridFTP Transfer
Data Resource
Data Replica
WorkflowsWorkflows
Request
Replication Control Service
MetadataSearchEngine
Metadata Register
Initiator
Selector
Starter
Metadata Catalogue
Relational Database Replication Mechanism
GridFTP Transfer
Data Resource
ReplicationTarget
FeaturesFeatures
• Keeping the features of relational database replication
• Adding Grid’s features
• Using Grid service discovery mechanism
• Supporting more replication scenarios
• Introduction of replication• Introduction of existing technologies
– Relational database replications are advanced in flexibility, offering solutions for frequent updating, update everywhere, data conflictions…
– Grid file replications are good at scalable, secure, and efficient file transferring
• We studied both model and combine the two structures to gain benefits from both
SummarySummary