Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
12.1 Distributed Database Systems
159
Why Distributed Databases ?
The decentralization of business functionsAdvances in computer networkingThe database is stored on several distributed systems (sites)Each site is able to process local transactions and participate in global transactionsNetwork topology
long-haul vs. local-area network (WAN, LAN)
160
Advantages of Data Distribution
Data sharing with local autonomyReliability and availability
The failure of a site does not necessarily imply the shutdown ofthe systemcrucial for real-time systems
Downsizing trendSpeedup of query processingParallel query execution
161
Developing a Distributed Database
Bottom-up Integration‘Islands of information’Heterogeneous hardware and softwareHeterogeneous distributed databases
Top-down DistributionHomogenous environmentFrom single-site database to distributed database
162
Autonomy
The degree to which sites may operate independentlyDistribution of controlTight couplingSemi-autonomyTotal isolation
163
Distribution Transparency
The extent to which the distribution of data to sites is shielded from users and applicationsData distribution strategies:
Basic distribution tablesExtracted tablesSnapshot tablesReplicated tablesFragmented tables
Horizontal fragmentationPrimaryDerived
Vertical fragmentationMixed fragmentation
164
. . . Distribution Transparency
Location(Network) transparencyhiding the details of the distribution of data in the network
Fragmentation transparencyWhen a data item is requested, the specific fragment need not be named
Replication transparencyA user should not know how a data item is replicated
Naming and local autonomytwo solutions in distributed databases:
a central name servereach site prefix its own site identifier to any name it generates: site5.deposit.f3.r2
165
Directory Management
Metadata (catalog) can be stored:Centrally
Master site
DistributivelyMaximize autonomy
Fully replicated
Metadata cachingDistributed approach with caching of metadata from remote sites
166
. . . Directory Management
How to ensure that the cached metadata is up-to-date?Owner responsibilityVersions
Dependency trackingOptimized execution plans
167
Distributed Concurrency Control
Nonreplicated SchemeEach site maintains a local lock manager to administer lock and unlock requests for local dataDeadlock handling is more complex
Single-Coordinator ApproachThe system maintains a single lock manager that resides in a single chosen siteCan be used with replicated dataAdvantages
simple implementationsimple deadlock handling
Disadvantagesbottleneckvulnerability
168
. . . Distributed Concurrency Control
Majority ProtocolA lock manager at each siteWhen a transaction wishes to lock a data item Q, which is replicated in n different sites, it must send a lock request to more than half of the n sites in which Q is storedcomplex to implementdifficult to handle deadlocks
169
. . . Distributed Concurrency Control
Biased ProtocolA variation of the majority protocolWhen a transaction needs a shared lock on data item Q, it requests the lock from the lock manager at one site containing a replica of QWhen a transaction needs an exclusive lock, it requests the lock from the lock manager at all sites containing a replica of Q
Primary CopyA replica is selected as the primary copyLocks are requested from the primary site
170
Deadlock Handling
Local waits-for graph (WFG) is not enoughCentralized siteDistributed deadlock avoidance
A unique priority is assigned to each transaction (time stamp + site number)Wait-die : if priority (Ti) > priority (Tj) then Ti waits, else Tjdies
171
. . . Deadlock Handling
Decentralized deadlock detectionPath pushing techniquePeriodically each site analyzes its local WFG and lists all paths. For each path Ti -> ... -> Tj, it sends the path information to every site where Tj might be blocked because of a lock wait
WFG1: T1 -> T2 -> T3 and T3 has made a RPC to site 2WFG2: T3 -> T4 and T4 has made a RPC to site 3WFG3: T4 -> T2
172
Distributed Transaction Management
More complex, since several sites may participate in executing a transactionTransaction manager
maintaining a log for recovery purposesconcurrency control
Transaction coordinator Starting the execution of the transactionBreaking the execution into a number of subtransactionsCoordinating the termination of the transaction
173
... Distributed Transaction Management
Additional failuressite failurelink failureloss of messagenetwork partition
Atomicity of transactionsAll participating sites either commit or abort
Protocols to ensure atomicityTwo-phase commitThree-phase commit
174
Two-phase Commit Protocol
Let T be a transaction initiated at site Si with coordinator CiWhen all the sites at which T has executed inform Ci that T has completed, then Ci starts the two-phase commit:
Phase 1Ci adds the record <prepare T> to the logsends a prepare T message to all sites at which T executedif a sit answers no, it adds <abort T> to its log and then responds by sending an abort T message to Ci
if a sit answers yes, it adds <ready T> to its log and then responds by sending a ready T message to Ci
Phase 2if Ci receives a ready T message from all the participating sites, T can be committed, Ci adds <commit T> to the log and sends a commit T message to allparticipating sites to be added to their logsif Ci receives an abort T message or when a prespecified interval of time has elapsed, it adds <abort T> to the log and then sends an abort T message
175
Handling of Failures
Failure of a participating site SkWhen the site recovers it examines its log and:
If the log contains a <commit T> record, Sk executes redo(T)
If the log contains a <abort T> record, Sk executes undo(T)
If the log contains a <ready T> record, the site must consult Cito determine the fate of T. If Ci is up, it notifies Sk as to whether T committed or aborted. If Ci is down, Sk must find the fate of T from other sites by sending a query-status T message to all the sites in the system
If the log has no control records concerning T, Sk executes undo(T)
176
... Handling of Failures
6-12
Failure of the coordinatorIf an active site contains a <commit T> in its log, then T must be committedIf an active site contains a <abort T> in its log, then T must be abortedIf some active sites does not contain <ready T> in its log, then T must be abortedIf non of the above cases hold, the active site must wait for Cito recover (blocking problem)
Failure of a linksame as before
177
Three-phase Commit Protocol
The major disadvantage of the two-phase commit protocol is that coordinator failure may result in blockingThe three-phase commit protocol is designed to avoid the possibility of blocking
178
Distributed Query Optimization
6-6
Other factors of distributed query processingCost of data transmissionPotential gain from parallel processingRelative speed of processing at each site
Join strategiesShip one operandShip both operandsSemijoinDistributed nested loop
Fragment access optimization
179
Distributed Integrity Constraints
Fragment integrityDistributed integrity constrains
180
Truly Distributed Capabilities
Code’s twelve rulesLocal autonomyNo reliance on a central site for any particular serviceContinues operationLocation independenceFragmentation independenceReplication independenceDistributed query processingDistributed transaction managementHardware independenceOperating system independenceNetwork independenceDBMS independence
181
Distributed Capabilities of Oracle 8
Transparent multisite distributed queryMultisite update with transparent two-phase commitFull location transparency and site autonomyGlobal database namesAutomatic table replication
12.2 Data Replication
183
Data Replication
Simpler form of a distributed database systemWhy ?Every DBMS vendor has a replication solutionGaining popularity
184
Data Replication Concepts
Synchronous vs. asynchronous replicationObject replicationData ownership
Master/slave ownershippublish-and-subscribe metaphor
Workflow ownershipat any moment there is only one site that can update
Update-anywhere (symmetric replication) ownershipcan lead to conflictconflict resolution
185
. . . Data Replication Concepts
Table snapshotsCreate SNAPSHOT local_staffREFERSH FAST (COMPLETE, FORCE)START WITH sysdate NEXT sysdate + 7AS SELECT * FROM Staff@Staff_Master_Site WHERE bno='B5'
Can use the transaction logRefresh groups
Database triggersCreate TRIGGER staff_after_ins_rowBEFORE INSERT ON StaffFOR EACH ROWBEGIN
INSERT INTO Staff_Duplicate@Staff_Duplicate_LinkVALUES (...)
END
Several drawbacks
186
. . . Data Replication Concepts
Conflict detectionEach site sends the before-and-after images to a replication serverViolation of referential integrity
Conflict resolutionEarliest and latest timestampSite priorityAdditive and average updatesMinimum and maximum valuesUser-definedHold for manual resolution
187
ORACLE8 Replication
Read-only ReplicationSnapshot log
Replication AdministratorsSetting up and implementing replication
Data SynchronizationAutomaticManualTwo-phase commit (2PC) through deferred transaction queueUser defined conflict resolution
Replication Manager