View
229
Download
0
Category
Tags:
Preview:
Citation preview
3
Database and Concurrency
Concurrent execution of user programs is essential for good DBMS performance. Why? Because disk accesses are frequent, and
relatively slow, it is important to keep the CPU humming by working on several user programs concurrently.
A user’s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. Is this OK?
Database state is defined in terms of data values. With concurrent access to shared data, what might be a problem with the database state? It is dynamic, not static.
4
Database Correctness The correctness of a DB is defined by consistency
Internal consistency (semantic integrity) Mutual consistency They cannot be enforced in each action. Why? Example 1: Transfer $1M from account A to account
B. Example 2: A = 2*B; Increase A by 50% -> same for B
A transaction DBMS’s abstract view of a user program: a sequence
of reads and writes A complete and consistent computation Atomicity, consistency, isolation, durability (ACID) Scheduler synchronizes concurrent operations
5
Database System Model Functional decomposition
Transaction manager (TM) Scheduler Data manager (DM)
• Recovery manager (RM)• Cache manager (CM)
Transaction manager A process running on behalf of the user transaction Performs any required preprocessing of database
(e.g., assigning a transaction id, selecting the participating sites) and operations it receives from transactions
6
Database System Model Data manager
Operates directly on the DB and responsible for transaction termination
Consists of RM and CM Recovery manager
Atomicity Resilient to failures: transaction, system, and
media Operations: start, commit, abort, read, write
Cache manager Manages data movement between main (volatile)
memory and disk (stable) storage Actions: fetch and flush
7
Concurrency in a DBMS Users submit transactions, and can think of each
transaction as executing by itself. Concurrency is achieved by the DBMS, which interleaves
actions (reads/writes of DB objects) of various transactions. Each transaction must leave the database in a consistent
state if the DB is consistent when the transaction begins.• DBMS will enforce some ICs, depending on the ICs
declared in CREATE TABLE statements.• Beyond this, the DBMS does not really understand the
semantics of the data. (e.g., it does not understand how the interest on a bank account is computed).
Issues: Effect of concurrent transactions, and failures.
8
Atomicity of Transactions
A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions.
A very important property guaranteed by the DBMS for all transactions is that they are atomic.
Atomicity: a transaction is assumed to be executing all its actions in one step, or not executing any actions at all. Not easy to achieve. Why?
9
Atomicity: Why Difficult?
A transaction executed alone leaves DB consistent.
During execution of a transaction, the DB state may be temporarily in an inconsistent state.
Other transactions may observe inconsistent DB state
The system may crash in the middle of the transaction execution.
10
Consistency, Isolation, Durability
A transaction executed in isolation must preserve DB consistency.
Even if multiple transactions executed concurrently, each should be unaware of other transactions are being executed concurrently.
When a transaction complete successfully, the changes it made must persist, even with failures afterwards.
11
Example
Consider two transactions (Xacts):
T1: BEGIN A=A+100, B=B-100 ENDT2: BEGIN A=1.06*A, B=1.06*B END
Intuitively, the first transaction is transferring $100 from B’s account to A’s account. The second is crediting both accounts with a 6% interest payment.
There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order.
12
Example (Contd.) Consider a possible interleaving:
T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B
Is this OK? What about the following?
T1: A=A+100, B=B-100 T2: A=1.06*A, B=1.06*B
The DBMS’s view of the second schedule:
T1: R(A), W(A), R(B), W(B)T2: R(A), W(A), R(B), W(B)
13
Anomalies with Interleaved Execution
Reading Uncommitted Data (WR Conflicts, “dirty reads”):
Unrepeatable Reads (RW Conflicts):
T1: R(A), W(A), R(B), W(B), AbortT2: R(A), W(A), C
T1: R(A), R(A), W(B), CT2: R(A), W(A), C
14
Anomalies (Continued)
Overwriting Uncommitted Data (WW Conflicts):
T1: W(A), W(B), CT2: W(A), W(B), C
15
Scheduling Transactions Schedule: An execution history Serial schedule: Schedule that does not interleave
the actions of different transactions. Is it efficient? Equivalent schedules: For any database state, the
effect of executing the first schedule is identical to the effect of executing the second schedule.
Serializable schedule: A schedule that is equivalent to some serial execution of the same set of transactions.
Is the following statement correct?If each transaction preserves consistency, every serializable schedule preserves consistency.
16
Scheduling and Ordering Ta=a1(x)a2(y); Tb=b1(x)b2(y)
Is a1a2b1b2 equivalent to a1b1a2b2? What about a1b1b2a2?The ordering a1b1a2b2 preserves the consistency, while the other does not.
Ordering actions serves the purpose of implementing atomic operations so as to preserve the consistency of the database.
DBMS may execute a set of transactions in any order as long as the effect is the same as that of some serial order. Is this OK?
What if the user or application wants a specific order between two transactions T1 and T2? How to enforce such ordering?
17
Conflicting Operations
Two operations conflict if They are issued by different transactions They operate on the same data object At least one of them is a write operation
R1(A)W2(A); W1(A)W2(A); W1(A)R2(A) R1(A)R2(A) Conflict equivalent
Two executions are conflict equivalent if in both executions, all conflicting operations have the same order.
18
Serializability Correctness criterion
Serializability is the correctness definition of DB All serializable schedules are equally correct Scheduling algorithms enforce certain ordering In distributed DBMS, variable delays may disturb
any particular ordering which is supposed to occur Equivalent schedules (broader than conflict
equivalence) Two schedules are equivalent if
• Every read operation reads from the same write in both schedules
• Both schedules have the same final write Serialization graph (dependency graph) shows
dependency relationship among transactions
19
Conflicts and Equivalence When do two operations conflict?
They are issued by different transactions They operate on the same data object At least one of them is a write operation
Conflict equivalent Two executions are conflict equivalent if in both
executions, all conflicting operations have the same order.
Equivalent schedules (broader than conflict equivalence) Two schedules are equivalent if
• Every read operation reads from the same write in both schedules
• Both schedules have the same final write
20
Serializability Correctness criterion
Serializability is the correctness definition of DB All serializable schedules are equally correct Scheduling algorithms enforce certain ordering In distributed DBMS, variable delays may disturb
any particular ordering which is supposed to occur
Serialization graph (dependency graph) shows dependency relationship among transactions
21
Serializability Theorem
Dependency relation Ti -> Tj implies that for some data object X,
ri(X) -> wj(X), or wi(X) -> ri(X), or wi(X) -> wj(X)
Serialization (dependency) graph A directed graph whose nodes are transactions
and there is an edge from Ti to Tj if and only if Ti ->Tj
Serialization Theorem For a schedule H, if SG(H) is acyclic, then H is
serializable. Examples
22
Equivalent ExecutionT1=r1(x)r1(z)w1(x)T2=r2(y)r2(z)w2(y)T3=w3(x)r3(y) w3(z)H1=w3(x)r1(x)r3(y)r2(y)w3(z)r1(z)r2(z)w2(y)w1(x) Dependence relationship?
T3->T1 T3->T2
H2=w3(x)r3(y) w3(z)r2(y)r2(z)w2(y)r1(x)r1(z)w1(x)
Is H2 a serial execution? Is H1 equivalent to H2? Is H1 serializable execution?
23
Properties of Schedules Recoverability
To ensure that aborting a transaction does not change the semantics of committed transactionsw1(x)r2(x)w2(y)C2
Is it recoverable? What if T1 aborts? Recoverable execution depends on commit order A transaction cannot commit until all values it
read are guaranteed not to be aborted. How to do it?
Delaying commit: T2 cannot commit until T1 commits
Cascaded abort is sometimes necessary. Why?w1(x)r2(x)w2(x)A1
24
Properties of Schedules
Recoverability Cascaded abort is sometimes necessary
w1(x)r2(x)w2(x)A1 Avoiding cascaded aborts
Achieved if every transaction reads only the values written by committed transactions
Must delay each r(x) until all transactions that issued w(x) is either committed or abortedw1(x) ….. C1 r2(x) w2(y) …
25
Properties of Schedules Restoring before images
Implementing transaction abort by simply restoring before images of all writes is very convenientw0(x)w1(x)w2(x) A1 A2
Value of x must be restored to the initial value, not the value written by T1
Solution: delay w(x) until all transactions that have written x are either committed or aborted
Strictness Executions that satisfy both requirements Delay both r(x) and w(x) until all transactions that
have written w(x) are either committed or abortedr1(x)w1(x) w1(y) w2(z) w2(x) C1 --- is it strict?
26
Properties of Schedules Recoverability (RC)
RC if Ti reads from Tj and Ci is in H, then Ci follows Cj Avoiding cascaded aborts (ACA)
ACA if Ti reads x from Tj then ri(x) follows Cj Strictness (ST)
ST if whenever Oi(x) follows wj(x), then Oi(x) follows either Aj or Cj
T1=w1(x)w1(y)w1(z) C1; T2=r2(u)w2(x)r2(y)w2(y) C2H1=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y)C2 w1(z) C1 –- SR? RC?C2 before C1 --- SR but not RCH2=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y) w1(z) C1 C2 –- RC? ACA?RC but not ACA --- if A1 rather than C1, it must be A2, not C2H3=w1(x)w1(y)r2(u)w2(x)w1(z) C1 r2(y)w2(y) C2 --- ACA? ST?ACA but not ST --- w2(x) before C1
27
Exercise on Properties of Schedules
What are the properties of the following schedule?
H1=w1(x)r3(x) w1(y) C1 w3(y) w3(x) C3H2=w1(x)r2(y)w1(y)w2(z)r1(u) C2 C1
H3=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y) C2 w1(z) C1 H4=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y) w1(z) C1 C2 H1 is SR and RC, but not ACA. Is it ST? How can you reschedule H1 to make it ST?H2 is SR and RC. Is it ACA?H3 is SR but not RC. Is it ACA? H4 is SR and RC, but not ACA
28
Relationships among Properties
Recoverability (RC) RC if Ti reads from Tj and Ci is in H, then Ci
follows Cj Avoiding cascaded aborts (ACA)
ACA if Ti reads x from Tj then ri(x) follows Cj Strictness (ST)
ST if whenever Oi(x) follows wj(x), then Oi(x) follows either Aj or Cj
What is the relationship among ST, ACA, and RC?ST < ACA < RC
What about with SR and Serial execution?
29
View Serializability A schedule is view serializable if it is view equivalent
to some serial schedule. Schedules S1 and S2 are view equivalent if:
If Ti reads value of A written by Tj in S1, then Ti also reads value of A written by Tj in S2
If Ti writes final value of A in S1, then Ti also writes final value of A in S2
H1=w1(x)w2(x)w3(x)w2(y)r1(y) H2=w2(x)w2(y)w1(x)r1(y)w3(x) Is H1 view equivalent to H2? Is H1 view serializable? Is H1 conflict serializable? How can you prove it? Serialization graph of H1 Relationship: Every C-SR is V-SR, but not the other
way.
30
Scheduling and Concurrency Control
Objective Atomic execution of transactions on shared
data by controlling the execution order of concurrent access
Approaches Two-phase locking Timestamp ordering Optimistic scheduling Hybrid schemes
31
Scheduling and Concurrency Control
TM <-> Scheduler <-> DM Options for a scheduler
When receiving a request from TM1) Immediately schedule it (send it to DM)2) Delay it (put it into a queue)3) Reject it (causing aborting the transaction)
Aggressive vs conservative approaches Optimistic vs pessimistic Aggressive favors immediate action (option 1); if
not possible to finish, abort T (option 3) Conservative favors option 2 Performance trade-offs between the two Most well-known conservative approach: two-phase
locking
32
Lock-Based Concurrency Control
Two-phase Locking (2PL) Protocol: Each data object has a lock associated with it. Transaction must obtain a S (shared) lock on object
before reading, and an X (exclusive) lock on object before writing.
If a conflicting lock is already set by other transaction, the requested operation will be delayed, forcing the transaction to wait
• If Ti holds an X lock on an object, no other Tj can get a lock (S or X) on that object.
Two-phase: growing phase and shrinking phase Once Ti releases a lock on any data object, it
cannot get another lock on any data object. Why? Examples
33
Example of Simple LockingT1=A+100->A; B+100->BT2=Ax2->A; Bx2->B
Simple locking:T1: lock A; A+100->A; unlock A; lock B; B+100->B; unlock
B T2: lock A; lock B; Ax2->A; Bx2->B; unlock A; unlock B Is this OK? What can be a problem here?Another example: T1=r1(x)w1(y)C1 T2=w2(x)w2(y)C2H=r1(x)w2(x)w2(y)C2w1(y)C1 Is it SR? How can you tell? Is it RC? Does it matter? How can you fix it?
34
Example of Two-Phase Locking
T1=A+100->A; B+100->BT2=Ax2->A; Bx2->B
2PL: T1: lock A; A+100->A; lock B; unlock A; B+100->B; unlock
B T2: Lock A; lock B; Ax2->A; Bx2->B; unlock A; unlock B Is this OK? H= T1: lock A; T1: A+100->A; T1: lock B; T1: unlock A;
T2: lock A; T2: … What will happen? T2 will be blocked and wait to get lock B T2 will continue when T1 finish B+100->B and unlock B
35
Correctness of Schedulers Need to prove
All schedules representing the executions that can be produced by the scheduler are serializable (SR)
How to do it? Enumerate all the possible schedules and check
any one of them is not SR. Good, but is it feasible?
Two –step approach Characterize the properties of its schedules Prove that any schedule with such properties are
SR How to characterize the properties?
36
Properties of 2PL Notation:
Oi(x): Operation O by transaction Ti on xOLi(x): lock for Oi(x); OUi(x): unlock after Oi(x)
1. If Oi(x) is in the schedule, then OLi(x) and OUi(x) are also in the schedule, and OLi(x) < Oi(x) < OUi(x)
2. If Oi(x) and Oj(x) are conflicting operations in the schedule, then either OUi(x) < OLj(x) or OUj(x) < OLi(x)
3. If Oi(x) and Oi(y) is in the schedule, then Oli(x) < OUi(y)
Where is this last property (property 3) coming from?
37
Correctness of 2PL Theorem: 2PL is correct (i.e., SR) Proof1. If Ti -> Tj is in the schedule, then for some x, OUi(x) <
OLj(x) 2. If Ti -> Tj ->Tk in the schedule, then Ti releases some
lock before Tj sets the lock, and the same for Tj and Tk. By induction, same for T1 and Tn if T1->T2-> … ->Tn
3. If the schedule has a cycle in the serialization graph, i.e., T1->T2-> … ->Tn->T1, then T1 releases some lock before T1 sets a lock.
4. This is a violation of two-phaseness (property 3), and it cannot be a 2PL schedule.
5. Hence a cycle cannot exist.
38
Variations of 2PL Strict Two-phase Locking Protocol:
Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing.
All locks held by a transaction are released together when the transaction completes
Non-strict 2PL: Release locks anytime, but cannot acquire locks after releasing any lock.
Almost all 2PL implementations are strict 2PL. Why? Practical reason: when scheduler can release lock?
Strict 2PL allows only serializable schedules. Additionally, it simplifies transaction aborts Non-strict 2PL also allows only serializable schedules,
but involves more complex abort processing
39
Aborting a Transaction
If a transaction Ti is aborted, all its actions have to be undone. Not only that, if Tj reads an object last written by Ti, Tj must be aborted as well!
We can avoid such cascading aborts by releasing a transaction’s locks only at commit time. If Ti writes an object, Tj can read this only after Ti commits.
In order to undo the actions of an aborted transaction, the DBMS maintains a log in which every write is recorded. This mechanism is also used to recover from system crashes: all active Xacts at the time of the crash are aborted when the system comes back up.
40
The Log
The following actions are recorded in the log: Ti writes an object: the old value and the new value.
• Log record must go to disk before the changed page! Ti commits/aborts: a log record indicating this action.
Log records are chained together by Xact id, so it’s easy to undo a specific Xact.
Log is often duplicated and archived on stable storage.
All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.
41
Recovering From a Crash There are 3 phases in the Aries recovery algorithm:
Analysis: Scan the log forward (from the most recent checkpoint) to identify all Xacts that were active, and all dirty pages in the buffer pool at the time of the crash.
Redo: Redoes all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk.
Undo: The writes of all Xacts that were active at the crash are undone (by restoring the before value of the update, which is in the log record for the update), working backwards in the log. (Some care must be taken to handle the case of a crash occurring during the recovery process!)
42
Summary
Concurrency control and recovery are among the most important functions provided by a DBMS.
Users need not worry about concurrency. System automatically inserts lock/unlock requests and
schedules actions of different Xacts in such a way as to ensure that the resulting execution is equivalent to executing the Xacts one after the other in some order.
Write-ahead logging (WAL) is used to undo the actions of aborted transactions and to restore the system to a consistent state after a crash. Consistent state: Only the effects of commited Xacts
seen.
Recommended