View
237
Download
0
Category
Tags:
Preview:
Citation preview
CSE4701
Chapter 14-1
Slides on Normalization
CSE4701
Chapter 14-2
Towards Normalization of Relations We take each Relation Individually and “Improve”
Them in Terms of the Desired Characteristics Normalization Decomposes Relations into Smaller
Relations that Results in No Information Loss Support for Reconstruction
No Spurious Joins Query Execution Time May Increase
Denormalization May Be Necessary Later on Objectives: Minimizing
Redundancy Insertion, Deletion, and Update Anomalies
CSE4701
Chapter 14-3
What is the Normalization Process?
Provides DB Designers with the Ability to “Improve” their Relations
Deal with Redundancies and Anomalies Normalization Procedure Provides DB Designs with
A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes
A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree
CSE4701
Chapter 14-4
What are Normal Forms?
A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria Primary keys (1NF, 2NF, 3NF) All Candidate Keys ( 2NF, 3NF, BCNF) Multivalued Dependencies (4NF) - Chapter 15 Join Dependencies (5NF) - Chapter 15
5 NF4NF
3NF
2NF
1NF
CSE4701
Chapter 14-5
How is Normalization Attained?
Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies
In Process, we must Maintain Two Properties: Lossless Join or Nonadditive Join Property
Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations
Dependency Preservation PropertyEnsures that each FD is Represented in some Individual Relation(s) after Decomposition
Premise: Relational Schema with Primary Keys and Functional Dependencies Specified
CSE4701
Chapter 14-6
Recall Key Constraints
Superkey (SK): Any Subset of Attributes Whose Values are
Guaranteed to Distinguish Among Tuples Candidate Key (CK):
A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity)
A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple
There may be Multiple Candidate Keys
CSE4701
Chapter 14-7
Recall Key Constraints
Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined
Foreign Key (FK): An Attribute or a Combination of Attributes (Say A)
of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain)
Allows Linkages Between Relations that are Tracked and Establish Dependencies
Useful to Capture ER Relationships
CSE4701
Chapter 14-8
Superkeys vs. Candidate Keys
Superkey of R: A Superkey SK is a Set of Attributes of R Such that
No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK
Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk]
For Cars, Valid Superkeys Must Contain:SerialNo OR State, Reg# OR Both
For EMPLOYEE {SSN} is a Key and{SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are
all SUPERKEYS
CSE4701
Chapter 14-9
Superkeys vs. Candidate Keys
Candidate Key of R: A "Minimal" Superkey: a Candidate Key K is a
Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey
Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T1 and T2 in R(r) such that T1[K-A] = T2[K-A]
In Previous (State, Reg#, Make, Model) is SKIs it a CK?Why or Why Not?
CSE4701
Chapter 14-10
Example and Remaining Definitions
Example: CAR(State, Reg#, SerialNo, Make, Model, Year) Primary key is {State, Reg#} It has two candidate keys (also superkeys)
Key1 = {State, Reg#} Key2 = {SerialNo}
{SerialNo} can also be Chosen as Primary Key Definition: Prime Attribute - Attribute A of R that is
Member of some Candidate Key K or R Definition: Non-Prime Attribute - An Attribute that is
not Prime (i.e., Not a Member of Any Candidate Key) WORKS_ON – SSN, Pnumber PRIME
CSE4701
Chapter 14-11
First Normal Form (1NF)
All Attributes Must Be Atomic Values: Only Simple and Indivisible Values in the Domain
of Attributes. Each Attribute in a 1NF Relation is a Single Value Disallows Composite Attributes, Multivalued
Attributes, and Nested Relations (Non-Atomic) 1NF Relation cannot have an Attribute Value :
A Set of Values (Set-Value) A Tuple of Values (Nested Relation)
1NF is a Standard Assumption of Relation DBs
CSE4701
Chapter 14-12
One Example of 1NF
Consider Following Department Relation What is the Inherent Problem?
DLOCATIONS is Multi-valued
CSE4701
Chapter 14-13
What are Possible Solutions?
Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION)
Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below)
Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations
Problems with Each? Best Solution?
CSE4701
Chapter 14-14
Another 1NF Example - Nested Relations
EMP_PROJ - Table and Tuples
Transition to:
CSE4701
Chapter 14-15
Second Normal Form (2NF)
Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies
Intuitively: A Relation Schema R is in Second Normal Form
(2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key
R can be Decomposed into 2NF Relations via the Process of 2NF Normalization
Successful Process Typically Involves Decomposing R into Two or More Relations
Iteratively Applying to Each Relation in Schema
CSE4701
Chapter 14-16
Full Functional Dependency
Full FD - Formally:Given R(U) and X, YU. If XY holds, and there exists no such X’ that X’X, and X’Y holds over R, then Y is fully dependent on X, denoted as XY
Full FD- Intuitively: A FD XY where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER} HOURS is full since Neither
SSN -> HOURS nor PNUMBER HOURS holds What about in the Following:
f
{S#, CN}Grade
CSE4701
Chapter 14-17
Partial Functional Dependency
Partial FD - Formally:Given R(U) and X, YU. If XY holds but Y is not fully dependent on X ( XY), then Y is partially functional dependent on X, denoted by XY
Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER} ENAME is Partial since
Removing PNUMBER still Results in the Valid FD SSN ENAME
Are Following Full or Partial?
p
{S#, CN}CN, {S#, CN}S#
{S#, CN, DNAME}Grade
f
CSE4701
Chapter 14-18
Second Normal Form (2NF)
Formal 2NF Definition R 2NF iff (i) R 1NF; (ii) all Non-Key Attributes in R are Fully
Functional Dependent on Every Key. Alternative Definition:
R 2NF iff the Attributes are Either a Candidate Key, or Fully Dependent on Every Key.
Reason: Partial Functional Dependencies may cause Update Problems
CSE4701
Chapter 14-19
Another Way to View the Problem If the Primary Key Contains a Single Attribute, than No
Need to Test for Problems This is 1NF but not 2NF since
Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN)
Pname and Ploc two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber)
CSE4701
Chapter 14-20
One Example of 2NF
Consider the Example Below
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
STUDENT_DEPT 1NF
“{S#, CN} DName, DHead” since S# DName and DName DHead is a Partial FD causes Anomalies
But STUDENT_DEPT 2NF
S# DHead CN GradeDName
fd1
fd2
fd3
CSE4701
Chapter 14-21
Recall the Anomalies…
Insertion Anomalies: No Department Can Be Recorded if it has No
Student Who Enrolls Courses Deletion Anomalies:
Delete the Last Student in a Department will also Delete the Department
Update Anomalies: Change a Head of a Department must Modify All
Students in that Department Due to Redundancies
STUDENT_DEPT(S#, DName, DHead, CN, Grade)
CSE4701
Chapter 14-22
One Example of 2NF (Continued)
Decomposition into 2NF by Separating Course Information from Department Information (Link S#)
S_D(S#, DName, DHead)
DHeadDName
fd2
fd3
S#
S_C(S#, CN, Grade)
fd1
S# CN Grade
CSE4701
Chapter 14-23
Another Example of 2NF
EMP_PROJ is 1NF with Key SSN, PNUMBER but… SSN ENAME - Means ENAME, a Non-Prime
Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both
PNUMBER {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both
CSE4701
Chapter 14-24
Another Example of 2NF
What Does Decomposition Below Accomplish? ENAME Fully Dependent on SSN PNAME, PLOC Fully Dependent on PNUMBER
Result: 2NF for EP1, EP2, and EP3
CSE4701
Chapter 14-25
Yet Another Example of 2NF
Consider 1NF Lots to Track Building Lots for Towns What is the 2NF Problem?
FD3: COUNTY_NAME TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#}
All Other Non-Prime Attributes are Fine
CSE4701
Chapter 14-26
Yet Another Example of 2NF
What Does Decomposition Below Accomplish? TAX_RATE Fully Dependent on COUNTY_NAME
Result: 2NF for LOTS1 and LOTS2
CSE4701
Chapter 14-27
Third Normal Form (3NF)
Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies
Intuitively: A Relation Schema R is in Third Normal Form
(3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key
R can be Decomposed into 3NF Relations via the Process of 3NF Normalization
In XY and Y Z , with X as the Primary Key, there is only a a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN Emp# Salary isn’t Problem Since Emp# is a Candidate Key
CSE4701
Chapter 14-28
Transitive Partial FDs
Transitive FD - Formally: Given R(U) and X, YU. If XY, YX and YX, YZ, then Z is called transitively functional dependent on X.
Transitive FD - Intuitively: a FD X Z that can be derived from two FDs XY and YZ SSN ENAME is non-transitive Since there is no set of
Attributes X where SSN X and X ENAME For FD X Z that can be derived from two FDs XY
and YZ, if Y is a Candidate Key – No Problem
CSE4701
Chapter 14-29
Third Normal Form (3NF)
Formal 3NF Definition R 3NF iff
(i) R 2NF;
(ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key.
Alternative Definition: R 3NF iff for every FD X Y, either X is a superkey, or Y is a key attribute.
Reason: Transitive Functional Dependencies may cause Update Problems
CSE4701
Chapter 14-30
One Example of 3NFSTUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF
S_C(S#, CN, Grade) 2NFS_D(S#, DName, DHead) 2NF S_D 3NF
S_C 3NF
“S# DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X) Dname (Y) and DName (Y) DHead (Z)
S#DHead
S# DHead CN GradeDNAME
fd1
fd2
fd3
CSE4701
Chapter 14-31
One Example of 3NF
S_C(S#, CN, Grade) 2NF
S_D(S#, DName, DHead) 2NF
S_D (S#, DName)
DEPT(DName, DHead)3NF
fd2 S# DName
fd3 DName DHead
DHeadDNameS#fd S# DHead
Decompose to Eliminate the Transitivity Within S_D
CSE4701
Chapter 14-32
Another Example of 3NF
EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN DNUMBER and DNUMBER DNAME
Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN
SSN is the Only Candidate Key of EMP_DEPT! Note: Also Similar Problem with SSN and
DMGRSSN via DNUMBER
CSE4701
Chapter 14-33
Another Example of 3NF
To Attain 3NF, Decompose into ED1 and ED2 Intuitively - we are Separating Out Employees and
Departments from One Another
CSE4701
Chapter 14-34
Yet Another Example of 3NF
Recall 2NF Solution for Building Lots Problem What is the 3NF Problem? Violate Alternative Defn.
In LOTS1, FD4 AREA PRICEAREA is not a SuperkeyPRICE not a Prime Attribute of LOTS1
CSE4701
Chapter 14-35
Yet Another Example of 3NF
Decompose to Introduce a Separate Key AREA Result: 3NF for LOTS1A and LOTS1B
CSE4701
Chapter 14-36
1NF and 2NF – Maintain FDs!
CSE4701
Chapter 14-37
Transition to 3NF – Maintain FDs!
CSE4701
Chapter 14-38
Summary of Progression – Maintain FDs!STUDENT_DEPT
1NF
S# DHead CN GradeDName
fd1
fd2
fd3
S_C S_D2NF
eliminate partial FDs
fd1
S# CN Grade DHeadDName
fd2
fd3
S#
DHead
S#S_D
DName
DEPT
S_C
3NF
eliminate transitive FDs
fd1
S# CN Grade
DName
fd3
fd2
CSE4701
Chapter 14-39
Summary of 1NF, 2NF, 3NF ConceptsTest Remedy (Normalization)
1NF Relation should have Form new relations for each nonatomic no nonatomic attributes attribute or nested relation. or nested relations.
2NF For relations where primary Decompose and set up a new relation key contains multiple for each partial key with its dependent attributes, no nonkey attribute(s). Make sure to keep a attribute should be relation with the original primary key functionally dependent on and any attributes that are fully a part of the primary key. functionally dependent on it.
3NF Relation should not have a Decompose and set up a relation that nonkey attribute functionally includes the nonkey attribute(s) that determined by another nonkey functionally determine(s) other attribute (or by a set of nonkey nonkey attribute(s). attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.
CSE4701
Chapter 14-40
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs
Intuitively: A Relation Schema R is in Boyce-Codd Normal
Form (BCNF) if Whenever an FD X A Holds in R, then X is a Superkey of R
R can be Decomposed into BCNF Relations via the Process of BCNF Normalization
There exist Relations that are in 3NF but not in BCNF The Goal is to have each Relation in BCNF (or 3NF)
CSE4701
Chapter 14-41
Boyce-Codd Normal Form (BCNF)
Formal BCNF Definition R BCNF iff
(i) R 1NF;
(ii) for every FD X Y, X is a Superkey, i.e., if X Y and YX, then X Contains a Key.
Properties of BCNF R BCNF iff for every FD X Y, either All Non-key Attributes Fully Dependent on Every Key All Key Attributes Fully Dependent on the Keys that
they do not Belong to No Attribute Fully Dependent on any Set of Non-key
Attributes
CSE4701
Chapter 14-42
Comparing the Normal Forms
1NF
2NF
3NF
BCNF
Eliminate the non-trivial functional
dependencies of non-key
attributes to key
Eliminate partial FDs of non-key attributes to key
Eliminate transitive FDs of non-key attributes to key
Eliminate partial and transitive FDs of key attributes to key
Poor Relational Schema DesignDeveloped as Stepping Stone
Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies
CSE4701
Chapter 14-43
One Example of BCNF
Recall 3NF Solution for Building Lots Problem Suppose that AREA is Sizes in Acres with
AREAs in Tolland County 0.5, 0.6, …, 1.0 AREAs in Windham County 1.1, 1.2, …, 2.0
Adding FD5: “AREA COUNTYNAME” What Does Data in LOTS1A Look like for Given Set
of Properties?
CSE4701
Chapter 14-44
LOTS1A PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5 T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6 T10 Tolland L3 0.9
One Example of BCNF
What is the Problem Here? What if you Delete W11? You have “Lost” the “Windham, 1.1” Combination
Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A
Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA COUNTY_NAME”
CSE4701
Chapter 14-45
Transition to BCNF – Maintain FDs!
Add new FD5
CSE4701
Chapter 14-46
One Example of BCNF
FD5: “AREA COUNTY_NAME” Satisfies 3NF: COUNTY_NAME is Prime Attribute Violates BCNF: AREA not a SuperKey of LOTS1A
So Do One More Split
CSE4701
Chapter 14-47
One Example of BCNF
LOTS1AX PROPERTY_ID# LOT# AREAT11 L1 0.5 T12 L2 0.8 W13 L6 1.5 W11 L1 1.1W12 L4 1.6 T10 L3 0.9
LOTS1AX PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6T10 Tolland L3 0.9
LOTS1AY AREA COUNTY_NAME0.5 Tolland... Tolland1.0 Tolland1.1 Windham... Windham2.0 Windham
CSE4701
Chapter 14-48
Consider the TEACH Relation:
in 3NF but NOT BCNF with FD1: {STUDENT, COURSE} INSTRUCTOR FD2: INSTRUCTOR COURSE
3 Possible Decompositions of TEACH: T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT)
All Three “Lose” FD1! 3rd is Best Since After Join, Recaptures FD1 and
Doesn’t Generate any Spurious Tuples
TEACH(STUDENT, COURSE, INSTRUCTOR)
Another Example of BCNF
CSE4701
Chapter 14-49
What Does Table Look Like?
Note TEACH in 3NF but NOT BCNF
CSE4701
Chapter 14-50
Reflections on Normalization
Normalization A Tool for Validating the Quality of the Schema,
Rather than Merely as a Method for Designing a Relational Schema
Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema
Normalization Process Actually a Process of Concept Separation Concept Separation is Result of Applying a Top-
down Methodology for Producing a Schema Via Subsequent Refinements and Decompositions
CSE4701
Chapter 14-51
Relational DB Design Process
Normalization Process Focused on Decomposition Raises Number of Questions
How do we Decompose a Schema into a Desirable Normal Form?
What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema?
Can we Guarantee the Decomposition’s Quality? Can we Prevent the “Loss” of Information? Are Dependencies Maintained in Decomposition?
CSE4701
Chapter 14-52
S# DName DHeadR = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
S1S2S3S4
D1D1D2D3
JohnJonhSmithBlack
Recall Transitive FD/Update Anomalies
S# Dhead” is a Transitive FD When S4 Graduates, Head Information of D3 Lost Similarly, If D5 has No Students Yet, then the Head
Information cannot be Stored in this Database Update Head of Any Department Requires an
Update to Every Student Enrolled in the Dept.
CSE4701
Chapter 14-53
What are Possible Decompositions?
S#
S1S2S3S4
D1D1D2D3
DHead
JohnJohnSmithBlack
DName
Information Based
R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
= { R1(S#, ), R2(DName, R3(DHead, )}
is Neither Lossless nor FD-Preserving
CSE4701
Chapter 14-54
What are Possible Decompositions?
S# DName
S1S2S3S4
D1D1D2D3
S# DHead
S1S2S3S4
JohnJohnSmithBlack
• Lossless Decomposition but not Dependency-Preserving
• DNameDHead is lost in the decomposition
R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
= { R1({S# ,DName}, {S#DName}),
R2({S#, DHead}, {S#DHead})}
2is Lossless but not FD-Preserving
CSE4701
Chapter 14-55
What are Possible Decompositions?
S# DName
S1S2S3S4
D1D1D2D3
DName DHead
JohnJohn
D1D1D2D3
Lossless & dependency-preserving decomposition
R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }
= { R1({S# ,DName}, {S# DName})
R3({DName, DHead}, {Dname DHead})}
is both Lossless and FD-Preserving
CSE4701
Chapter 14-56
Summary of Normalization
2NF
3NF
BCNF
1NF
Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes
Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes
Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key
Lossless Decompositionbut not Dependency Preserving
Lossless Decompositionand Dependency Preserving
CSE4701
Chapter 14-57
The Entire Normalization Picture1NF
2NF
3NF
BCNF
Eliminate Partial FDs of Non-prime Attributes to Key
Eliminate Transitive FDs of Non-prime Attributes to Key
Eliminate Partial and Transitive FDs of Prime Attributes to Key
4NF
Eliminate Non-trivial and Non-functional Multi-Valued Dependencies
5NF
Eliminate Join Dependencies that are Not Implied by Candidate Key
CSE4701
Chapter 14-58
What are Multi-Valued Dependencies?
Focused on the Concept of Multi-Valued Dependencies A MVD X Y Indicates that a Value of X
Corresponds to Multiple Values of Y Consider EMP with MVDs:
ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)
CSE4701
Chapter 14-59
What is Fourth Normal Form (4NF)?
A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X Y in F+, X is a Superkey for R
Reconsider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)
ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish
We need to Decompose EMP!
CSE4701
Chapter 14-60
Decomposition into 4NF
ENAME PNAME is Trivial MVD: ENAME PNAME is
Equal to EMP_PROJECTS (same for ENAME DNAME)
CSE4701
Chapter 14-61
What about the Supply Table?
In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects
Removes Join Dependencies – Many-many-many
CSE4701
Chapter 14-62
Slides on Query Optimization
Chaps17&18-63
CSE4701
Query Optimization Objectives Improving Performance Arriving at a Query Plan of Execution Analyzing the Relational Algebra Query
Replace Costly Operations Do Selections and Projections Early
Optimization Heuristics for the Relational Algebra Performing Selection and Projection Before Join Combining Several Selections Over a Single
Relation Into One Selection Find Common Subexpressions Algebraic Rewriting/transformation Rules
General Transformation Rules for Relational Algebra (Equivalence-preserving Algebraic Rewriting Rules)
Chaps17&18-64
CSE4701
Why is it important?
SELECTENAMEFROM E,WWHERE E.ENO = W.ENO AND W.RESP = "Manager"
Strategy 1 ENAME(RESP="Manager"E.ENO=G.ENO(E W))
Strategy 2 ENAME( E ENO(RESP="Manager"(W)))
Query Optimization: An Example
Chaps17&18-65
CSE4701
Assume : card(E) = 4,000; card(W)=10,000 10% of tuples in W satisfy RESP="Manager"
(selection generates 1,000 tuples) Execution time Proportional to the Sum of the
Cardinalities of the Temporary Relations Searching is Done by Sequential Scanning
Strategy 1 Strategy 2Cartesian prod. = 40,000,000 Selection over W = 10,000Search over all = 40,000,000 Join(4000*1000) = 4,000,000
80,000,000 4,010,000
Cost of Alternatives
Chaps17&18-66
CSE4701
General Query Optimization Strategy Perform Selections Early
Yields Smaller Intermediate Results Direct Impact on Subsequent Join/Cartesian Prod.
Combine Selections with a Prior Cartesian Product into a Theta or Equi Join Join is a Cheaper Operation
Combine (Cascade) Selections and Projections
AB(B (R)) AB(R)
p1 ( p2 (R)) p1 ^ p2 (R)
This Results in One Pass Instead of Two over Table
Chaps17&18-67
CSE4701
General Query Optimization Strategy Identify Common Subexpressions
Compute Once and Store use Stored Version for Subsequent Times Often Useful When Views are Employed
Preprocess Data via Sorts and Indexes Speeds up Searches and Joins by Limiting Scope
Evaluate and Assess Different Options For Cartesian Product, Use Smaller Relation for
Comparison Use System Catalog (Meta-data) to Effect Order in
Query Execution Plan
Chaps17&18-68
CSE4701
Relational Algebra Transformations
1. Cascade of Selection
p1 ^ p2 ^ …^ pn(R)p1
(p2(...(pn
(R))...))
2. Commutativity of Selection
p1(p2
(R))p2(p1
(R))
p1 orp2(R )p1
(R p2(R)
3. Cascade of Projection
A1,A2, … An(R)A1(A2(...(An(R))...))
A1(R) if A1 A2 ... An4. Commuting Selection with Projection (A’s not in p)
A1,A2,...,An(p(R))p(A1,A2,...,An(R)
Chaps17&18-69
CSE4701
Relational Algebra Transformations
5. Commutativity of Theta Join and Cartesian Product R A SS A R R SS R
6. Commuting Selection with Theta Join (Cartesian) p(A)(R S) p(A)(R)) S
A defined on R only p(A)^p(B)(R S) p(A)(R)) (p(B)(S))
(A defined on R, B defined on S) Also Holds for Theta Join as Well
7. Commuting Projection with Theta Join (Cartesian) C(R S) A(R) B(S) where AB=C A are Attributes in C for R and B are Attributes in
C for S
Chaps17&18-70
CSE4701
Relational Algebra Transformations
8. Commutativity of Set Operations R S S R R S S R
9. Associativity of Set Operations (R S) T R S T) (R S) T R (S T) (R S) S R (S T) (R S) S R (S T)
10. Commuting Select with Set Operations
p(Ai)(R T) p(Ai)(R) p(Ai)(T)
where Ai is defined on both R and T
p(Ai)(R T) p(Ai)(R) p(Ai)(T)
where Ai is defined on both R and T
Chaps17&18-71
CSE4701
11. Commuting Projection with Union
C(R q(Aj,Bk) S) A(R) q(Aj,Bk) B(S)
C(R S) A’ (R) B’ (S)
where R[A] and S[B]
C = A' B' where A' A, B’ B12. Converting Selection/Cartesian Into Theta Join
C (R S) R S
Relational Algebra Transformations
C
Chaps17&18-72
CSE4701
Using Heuristics in Query Optimization
Process for heuristics optimization1. The parser of a high-level query generates an initial
internal representation;2. Apply heuristics rules to optimize the internal
representation.3. A query execution plan is generated to execute
groups of operations based on the access paths available on the files involved in the query.
The main heuristic is to apply first the operations that reduce size of intermediate results E.g., Apply SELECT and PROJECT operations
before applying the JOIN or other operations.
Chaps17&18-73
CSE4701
Using Heuristics in Query Optimization (2) Query tree:
A tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes.
An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation.
Query graph: A graph data structure that corresponds to a relational
calculus expression. It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.
Chaps17&18-74
CSE4701
Using Heuristics in Query Optimization
Heuristic Optimization of Query Trees: The same query could correspond to many different
relational algebra expressions — and hence many different query trees.
The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.
Example:Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECTWHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’;
Chaps17&18-75
CSE4701
Heuristics Algebraic Optimization Concepts Using Cascade of Selections Rule, Break up Any
Selections With Conjunctive Conditions Into a Cascade of Selections Allows More Freedom in Moving Selections
Down Different Branches of the Tree Using Commutativity of Selections with Other
Operations Rules, Move Each Selection Down the Query Tree as far as Possible
If Possible, Combine a Cartesian Product With a Selection Into a Join
Chaps17&18-76
CSE4701
Heuristics Algebraic Optimization Concepts Using Associativity of Binary Operations, Rearrange
the Leaf Nodes So That the Most Restrictive Selections Are Executed First The Fewer Tuples the Resulting Relation Contains,
the More Restrictive the Selection Reducing the Size of Intermediate Results
Improves Performance Using Cascade of Projections and Commutativity of
Projections with Other Operations, Move Projections Down the Query Tree as Far as Possible
Identify Subtrees that Represent Groups of Operations that can be Executed by a Single Algorithm
Chaps17&18-77
CSE4701
Heuristic Algebraic Optimization Algorithm Use Rule 1 to Break up Selects with Conjunctions into
a Cascade to Move them Down the Query Tree Use Rules 2, 4, 6, and 10 to Commute Select with
Project, Join, Cart. Prod., Union, and Intersection Use Rule 5 (Commute) and 9 (Associative) to
Rearrange the Leaf Nodes of Query Tree to: Most Restrictive Select Executed First Avoid Cartesian Product in Leaf Nodes
Use Rule 12 to Convert a Select/Cart Prod to Join Use Rules 3, 4, 7, and 11 to Cascade and Commute
Project - Pushing Down Tree as Far as Possible Identify Subtrees that Can Execute as Independent
Algorithms (Set of Operations)
Chaps17&18-78
CSE4701 ENAME
(DUR=12 OR DUR=24) AND
JNAME=“CAD/CAM” AND
ENAME= “J. DOE”
JNO
ENOP
W E
Canonical query tree at the end of query preprocessing phase
E(ENAME, ENO)P(JNO,JNAME)
W(ENO,PNO,DUR)
Heuristic Optimization: Example
Chaps17&18-79
CSE4701
ENAME
DUR=12 OR DUR=24
JNAME=“CAD/CAM”
ENAME = “J. DOE”
JNO
ENOP
W E
Use cascading of selectionsrule to decompose selections
Heuristic Optimization– Example
Chaps17&18-80
CSE4701
E
ENAME = "J. Doe"
JNO
ENO
P W
ENAME
DUR=12 OR DUR=24
JNAME=“CAD/CAM” Push selection downusing commutativity of selection over join
Heuristic Optimization– Example
Chaps17&18-81
CSE4701
P
JNO
JNAME = "CAD/CAM"
E
ENAME = "J. Doe"
ENO
W
ENAME
DUR=12 OR DUR=24 Push selection downusing commutativity of selection over join
Heuristic Optimization–Example
Chaps17&18-82
CSE4701
E
ENAME
ENAME = "J. Doe"
WP
JNO
ENO
JNAME = "CAD/CAM" DUR =12 DUR=24
Push selection down
Heuristic Optimization–Example
Chaps17&18-83
CSE4701
E
ENAME
ENAME = "J. Doe"
WP
JNO
JNO,ENAME
ENO
JNAME = "CAD/CAM"
JNO
DUR =12 DUR=24
JNO,ENO
JNO,ENAMEDo early projection
Heuristic Optimization–Example
Chaps17&18-84
CSE4701
E
ENAME
ENAME = "J. Doe"
W
P
JNO
JNO,ENAME
ENO
JNAME = "CAD/CAM"
JNO
DUR =12 DUR=24
JNO,ENO
JNO,ENAME
Identify subtrees thatcan be implemented in one algorithm
Heuristic Optimization–Example
Chaps17&18-85
CSE4701
BOOKS(Title, Author, Pname, LC_No)PUBLISHERS(Pname, Paddr, Pcity)BORROWERS(Name, Addr, City, Card_No)LOANS(Card_No, LC_No, Date)
Let XLOANS = S(F(Loans x Borrowers x Books))where:S ={Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date}andF = {Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No}
Heuristic Optimization: A Second Example
Chaps17&18-86
CSE4701
XLOANS
Books
Loans Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Heuristic Optimization: A Second Example
Chaps17&18-87
CSE4701
Query= TITLE(Date 1/1/88 (XLOANS))
Books
Loans Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Title
Date 1/1/88
Heuristic Optimization: A Second Example
Chaps17&18-88
CSE4701
Books
Loans Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Title
Date 1/1/88
Date 1/1/88
Try to Cascade
Heuristic Optimization: A Second Example
Chaps17&18-89
CSE4701
Books
Loans Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Title
Date 1/1/88
Commute Selectand Project
Heuristic Optimization: A Second Example
Chaps17&18-90
CSE4701
Books
Loans Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Title
Date 1/1/88
Commute Selectand Select
Heuristic Optimization: A Second Example
Chaps17&18-91
CSE4701
Books
Loans
Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Title
Date 1/1/88
Commute Select andCartesian ProductTwo Levels Down
Heuristic Optimization: A Second Example
Chaps17&18-92
CSE4701
Books
Loans
Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No
X
X
Title
Date 1/1/88
Try to CascadeBooks.LC_No = Loans.LC_No
Heuristic Optimization: A Second Example
Chaps17&18-93
CSE4701
Books
Loans
Borrower
Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date
Borrower.Card_No = Loans.Card_No
X
X
Title
Date 1/1/88
Commute Select andCartesian ProductOne Level Down
Books.LC_No = Loans.LC_No
What’s Next?
Heuristic Optimization: A Second Example
Chaps17&18-94
CSE4701
Books
Loans
Borrower
Borrower.Card_No = Loans.Card_No
X
X
Title
Date 1/1/88
CombineProjections
Books.LC_No = Loans.LC_No
What is Still a Problem?We are Not Projecting so All Attributes are Still Collected Until the Final Project!
Heuristic Optimization: A Second Example
Chaps17&18-95
CSE4701
Books
Loans
Borrower
Borrower.Card_No = Loans.Card_No
X
X
Title
Date 1/1/88
Add Strategic Projections to Send Only the Minimum
Up the Tree as Needed for Join/Result Set
Books.LC_No = Loans.LC_No
Heuristic Optimization: A Second Example
Loans.LC_No,Loans.Card_No
Loans.LC_No
Borr.Card_No
Books.LC_No, Title
Chaps17&18-96
CSE4701
Books
Loans
Borrower
Borrower.Card_No = Loans.Card_No
X
X
Title
Date 1/1/88
Books.LC_No = Loans.LC_No
Heuristic Optimization: A Second Example
Loans.LC_No,Loans.Card_No
Loans.LC_No
Borr.Card_No
Books.LC_No, Title
What is the Final Step? Combine Select and Cartesian Product
Result: Equijoins!
Chaps17&18-97
CSE4701
Heuristics Query Optimization: Summary First Apply Operations that Reduce the Size of
Intermediate Results Move Selections and Projections Down the Tree as
far as Possible Early Selections Reduce the Number of Tuples Early Projections Reduce the Number of Attributes
Selection and Join Should be Executed Before Other Similar Operations. This is Accomplished by Reordering the Leaf Nodes of
the Tree Among Themselves and Adjusting the Rest of the Tree Appropriately
CSE4701
Chapter 14-98
Slides on Concurrency Control Algorithms
Chaps19&20-99
CSE 4701
What is a Schedule? Transaction schedule or history:
When transactions are executing concurrently in an interleaved fashion, the order of execution of operations from the various transactions forms what is known as a transaction schedule
A schedule S of n transactions T1, T2, …, Tn is: Ordering of operations of transactions where, for
each transaction Ti that participates in S, the operations of T1 in S must appear in the same order in which they occur in T1.
Operations from other transactions Tj can be interleaved with the operations of Ti in S.
Chaps19&20-100
CSE 4701
What is a Schedule? A Schedule S is a Sequence of R/W Operations,
Which End with Commit or Abort Different Transactions Executing Concurrently in
an Interleaved Fashion with One Another Each Transaction a Sequence of R/W Operations
Two Schedules S1 and S2 are Equivalent, Denoted as S1 S2 , If and Only If S1 and S2 Execute the Same Set of Transactions Produce the Same Results (i.e., Both Take the DB
to the Same Final State)
Chaps19&20-101
CSE 4701
Transactions and a Schedule Below are Transactions T1 and T2 Note that the Their Interleaved Execution Shown
Below is an Example of One Possible Schedule There are Many Different Interleaves of T1 and T2
T1 T2
Read(X);X:=X;Write(X);
Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1;
Chaps19&20-102
CSE 4701
Transactions and a Schedule What Happens if the Schedule Changes to:
T1 T2
Read(X);X:=X;Write(X);
Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
T1 T2
Read(X);X:=X;
Write(X);
Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);
X:=X;Write(X);commit;
Chaps19&20-103
CSE 4701
Equivalent Schedules Are the Two Schedules below Equivalent? S1 and S4 are Equivalent, since They have the Same Set
of Transactions and Produce the Same ResultsT1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S1
T1 T2
Read(X);X:=X;Write(X);
Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S4
S4: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1;
S1: R1(X),W1(X), R1(Y), W1(Y), c1, R2(X), W2(X), c2;
Chaps19&20-104
CSE 4701
What are Different Types of Schedules? Recoverable schedule:
One where no transaction needs to be rolled back. No transaction T in S commits until all transactions
T’ that write an item that T reads have committed. Cascadeless schedule:
One where every transaction reads only the items that are written by committed transactions.
Cascaded rollback: A schedule in which uncommitted transactions that
read an item from a failed transaction must be rolled back – Read value written by Failed Trans
Strict Schedules: A schedule in which a transaction can neither read
or write an item X until the last transaction that wrote X has committed.
Chaps19&20-105
CSE 4701
Serial and Serializable Schedules Serial schedule:
A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule. Otherwise, the schedule is called nonserial schedule.
Serializable schedule: A schedule S is serializable if it is equivalent to
some serial schedule of the same n transactions. Being serializable implies that the schedule is a correct
schedule that: Leaves the database in a consistent state. The interleaving of operations results in a state as
if the transactions were serially executed, while achieving efficiency due to concurrent execution.
Chaps19&20-106
CSE 4701
Serializability of Schedules A Serial Execution of Transactions Runs One
Transaction at a Time (e.g., T1 and T2 or T2 and T1) All R/W Operations in Each Transaction Occur
Consecutively in S, No Interleaving Consistency: a Serial Schedule takes a Consistent
Initial DB State to a Consistent Final State A Schedule S is Called Serializable If there Exists an
Equivalent Serial Schedule A Serializable Schedule also takes a Consistent
Initial DB State to Another Consistent DB State An Interleaved Execution of a Set of Transactions
is Considered Correct if it Produces the Same Final Result as Some Serial Execution of the Same Set of Transactions
We Call such an Execution to be Serializable
Chaps19&20-107
CSE 4701
Example of Serializability Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20
After S1 or S2 X = 7 and Y = 40 These are the two Possible Serial Schedules
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S1 Schedule S2
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Chaps19&20-108
CSE 4701
Example of Serializability Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20
After S1 or S2 X = 7 and Y = 40 Is S3 a Serializable Schedule?
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S1 Schedule S2
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
T1 T2
Read(X);X:=X;
Write(X);Read(Y);
Y = Y + 20;Write(Y);commit;
Read(X);X:=X;
Write(X);commit;
Schedule S3
Chaps19&20-109
CSE 4701
Example of Serializability Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20
After S1 or S2 X = 7 and Y = 40 Is S4 a Serializable Schedule?
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S1 Schedule S2
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
T1 T2
Schedule S4
Read(X);X:=X;Write(X);
Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Chaps19&20-110
CSE 4701
Two Serial Schedules with Different Results Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20
After S1 X = 7 and Y = 28 After S2 X = 7 and Y = 27
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = X + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S1 Schedule S2
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = X + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
A Schedule is Serializableif it Matches Either S1 or S2 ,Even if S1 and S2 Produce Different Results!
Chaps19&20-111
CSE 4701
Thoughts on Serializability Serializability is hard to check
Interleaving of operations occurs in an operating system through some scheduler
Difficult to determine beforehand how the operations in a schedule will be interleaved
Need to Adopt a Practical Approach Come up with methods (protocols) to ensure
serializability. However, it is not possible to determine when a
schedule begins and when it ends. Hence, we reduce the problem of checking the
whole schedule to checking only a committed project of the schedule
Chaps19&20-112
CSE 4701
How do we Check for Conflicts? Testing for conflict serializability:
Look at only read_Item (X) and write_Item (X) operations
Constructs a precedence graph (serialization graph) with directed edges
An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj
The schedule is serializable if and only if the precedence graph has no cycles.
Chaps19&20-113
CSE 4701
The Serializability Theorem A Dependency Exists Between Two Transactions If:
They Access the Same Data Item Consecutively in the Schedule and One of the Accesses is a Write
Three Cases: T2 Depends on T1 , Denoted by T1 T2
T2 Executes a Read(x) after a Write(x) by T1
T2 Executes a Write(x) after a Read(x) by T1
T2 Executes a Write(x) after a Write(x) by T1 Don’t carE about Read(x) Read(x)
Transaction T1 Precedes Transaction T2 If:
There is a Dependency Between T1 and T2, and
The R/W Operation in T1 Precedes the Dependent T2 Operation in the Schedule
Chaps19&20-114
CSE 4701
The Serializability Theorem A Precedence Graph of a Schedule is a Graph
G = <TN, DE>, where Each Node is a Single Transaction;
i.e.,TN = {T1, ..., Tn} (n>1)
and Each Arc (Edge) Represents a Dependency Going
from the Preceding Transaction to the Other i.e., DE = {eij | eij = (Ti, Tj), Ti, Tj TN}
Use Dependency Cases on Prior Slide The Serializability Theorem
A Schedule is Serializable if and only of its Precedence Graph is Acyclic
Chaps19&20-115
CSE 4701
Serializability Theorem Example Consider S1 and S2 for Transactions T1 and T2
Consider the Two Precedence Graphs for S1 and S2 No Cycles in Either Graph!
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
Schedule S1 Schedule S2
T1 T2
Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
T1 T2
X
Schedule S1
T1 T2
X
Schedule S2
Chaps19&20-116
CSE 4701
What are Precedence Graphs for S3 and S4? For S3
T1 T2 (T2 Write(X) After T1 Write(X)) T2 T1 (T1 Write(X) After T2 Read (X))
For S4 T1 T2 (T2 Read/Write(X) After T1 Write(X))
T1 T2
X
Schedule S4
T1 T2
Read(X);X:=X;
Write(X);Read(Y);
Y = Y + 20;Write(Y);commit;
Read(X);X:=X;
Write(X);commit;
Schedule S3
T1 T2
Schedule S4
Read(X);X:=X;Write(X);
Read(Y);Y = Y + 20;Write(Y);commit;
Read(X);X:=X;Write(X);commit;
T1 T2
X
Schedule S3
X
Chaps19&20-117
CSE 4701
Four Schedules and their …
Chaps19&20-118
CSE 4701
… Precedence Graphs
Chaps19&20-119
CSE 4701
Serializability Facts Serializability Emphasizes Throughput Serializable Executions Allow us to Enjoy the Benefits
of Concurrency without Giving up Any Correctness However, we May NOT GET the Same Result
Testing for Serializability Difficult in Practice: Finding a Serializable Schedule for an Arbitrary
Set of Transactions is NP-hard Interleaving of Operations From Concurrent Transs
is Determined Dynamically at Run-time Practically Almost Impossible to Determine
Ordering of Operations Beforehand to Ensure Serializability
Chaps19&20-120
CSE 4701
Database Concurrency Control Purpose of Concurrency Control
To enforce Isolation (through mutual exclusion) among conflicting transactions.
To preserve database consistency through consistency preserving execution of transactions.
To resolve read-write and write-write conflicts. Example:
In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.
Chaps19&20-121
CSE 4701
Concurrency Control Different Locking-Based Algorithms
Binary Locks (Lock and Unlock) Share Read Locks and Exclusive Write Locks Write Lock Does Not Imply Read
2 Phase Protocol All Locks Must Precede All Unlocks in Trans. True for All Transactions - Schedule Serializable
Concurrency Control Implementation Techniques Optimistic Concurrency Control
Time-Based Access to Information Consider “When” Information Read/Written to
Identify Potential or Prior Conflicts We’ll Deviate from Textbook Notation
Chaps19&20-122
CSE 4701
Summary of CC Techniques Two-Phase Locking
Most Important in Practice Used by a Majority of DBMSs Serializes in the Middle of Transactions Low Overhead Relatively Low Concurrency
Timestamp-Based Based on Multiple Versions of Data Items Serializes at the Beginning of Transactions Mostly Used in Distributed DBMSs
Optimistic Concurrency Control Methods Serializes at the End of Transactions Relatively High Concurrency
Chaps19&20-123
CSE 4701
Recalling Important Concepts Transaction: Sequence of Database Commands that
Must be Executed as a Single Unit (Program) Recall SQL Update Query
Equivalent to Multiple Operations Read from DB, Modify (Local Copy), Write to DB Modify Sometimes Delete and Insert
Granularity: Size of Data that is Locked for an Executing DB Transaction - Wide Range Database Relation (Tuple vs. Entire Table) Attribute (Column) Meta-Data (System Catalog)
Locking: Provides Means for Synchronization
Chaps19&20-124
CSE 4701
Transaction Example Two Possible Outcomes for T1 and T2 – Let A = 5
If T1 First, then A = 150 If T2 First, then A = 60
Is this a Problem?
T1 T2
LOCK AREAD AA=A+10WRITE AUNLOCK Acommit;
LOCK AREAD AA=A*10WRITE AUNLOCK Acommit;
T1 T2
LOCK AREAD AA=A+10WRITE AUNLOCK Acommit;
LOCK AREAD AA=A*10WRITE AUNLOCK Acommit;
Chaps19&20-125
CSE 4701
Transaction Example The Two Different Orderings of
T1 and T2 Represent Alternate Serial Schedules (Non-Interleaved)
Key Concept: Concurrent (Interleaved) Execution of Several DB Transactions is Correct if and only if its Effect is the Same as that Obtained by Running the Same Transactions in a Serial Order
If Result is Either 150 or 60 – it is OK! This is the Concept of Serializability!
T1 T2
LOCK AREAD AA=A+10WRITE AUNLOCK Acommit;
LOCK AREAD AA=A*10WRITE AUNLOCK Acommit;
Chaps19&20-126
CSE 4701
Recalling Key Definitions A Schedule for a Set of Transactions is the Order in
When the Elementary Steps (Read, Lock, Assign, Commit, etc.) are Performed
A Schedule is Serial if All Steps of Each Transaction Occur Consecutively
A Schedule is Serializable if it is Equivalent to “Some” Serial Schedule
If T1, T2 and T3 are Transactions - What are the Possible Serial Schedules? T1 T2 T3 T1 T3 T2 T2 T1 T3
Different Serial Schedules for 4 Transactions?
T2 T3 T1 T3 T1 T2 T3 T2 T1
Chaps19&20-127
CSE 4701
Another Example of Serializability Two Serial Schedules – Let A = 15, B = 25, C=5 What are Values of A, B, and C after Each?
T1 T2
Read(A);A:=A10;Write(A);Read(B);B = B + 10;Write(B);commit;
Read(B);B:=B20;Write(B);Read(C);C=C+20Write(C)commit;
T1 T2
Read(A);A:=A10;Write(A);Read(B);B = B + 10;Write(B);commit;
Read(B);B:=B20;Write(B);Read(C);C=C+20Write(C)commit;
S1 S2
A = 5, B = 15, C=25
Chaps19&20-128
CSE 4701
Another Example of Serializability Is S3 or S4 – Let A = 15, B = 25, C = 5 Serial Values:
T1 T2
Read(A);
A:=A10;
Write(A);
Read(B);
B = B + 10;
Write(B);
commit;
Read(B);
B:=B20;
Write(B);
Read(C);
C=C+20
Write(C)
commit;
T1 T2
Read(A);A:=A10;
Write(A);
Read(B);
B = B + 10;
Write(B);commit;
Read(B);
B:=B20;
Write(B);
Read(C);
C=C+20Write(C)commit;
A = 5, B = 15, C=25
A = 5B = 35C = 25
A = 5B = 15C = 25
Chaps19&20-129
CSE 4701
Locks Lock: Variable Associated with a Data Item in DB,
Describing the Status of that Item w.r.t. Possible Ops. A Means of Synchronizing the Access by
Concurrent Transactions to the Database Item Managed by Lock Manager
Binary Locks: Lock(x) and Unlock(x) A Transaction T Must Issue the Lock(x) before any
Read(x) or Write(x) A Transaction T Must use the Unlock(x) After all
Read(x)/Write(x) Operations are Completed in T System Catalog Maintains a Lock Table for All
Locked Items Lock(x)(or Unlock(x)) will not be Granted if there
Already Exists a Lock(x) (or Unlock(x))
Chaps19&20-130
CSE 4701
Database Transaction is a Sequence of Lock/Unlocks Item Locked must Eventually be Unlocked A Transaction Holds a Lock between Lock and Unlock
Statements Lock/Unlock Assumes that the Value of the Item
Changes (Always Assumes a Write)
For a Number of Transactions that Lock/Unlock A, we’d have: f1(f2(f3( … fn( a0))))
A Basic Lock/Unlock Model
a0 f(a0) a0 Lock AUnlock Af(a0)
Chaps19&20-131
CSE 4701
Example - Assessing Schedule Consider Three Transactions Below:
T1 has f1(a) and f2(b) T2 has f3(b) and f4(c) and f5(a) T3 has f6(a) and f7 (c)
Functions Represent actions that Modify Instances a, b, and c of Data Items A, B, and C, Respectively
T1 Lock ALock BUnlock AUnlock B
T2 Lock BLock CUnlock BLock AUnlock CUnlock A
T3 Lock ALock CUnlock CUnlock A
Chaps19&20-132
CSE 4701
Example - Assessing Schedule Consider the Schedule with Changes to a, b, and c
Is this Schedule Serializable?
A B C
T1 Lock A a b cT2 Lock B a b cT2 Lock C a b cT2 Unlock B a f3(b) c T1 Lock B a f3(b) c T1 Unlock A f1(a) f3(b) c T2 Lock A f1(a) f3(b) c T2 Unlock C f1(a) f3(b) f4( c ) T2 Unlock A f5 (f1(a)) f3(b) f4( c ) T3 Lock A f5 (f1(a)) f3(b) f4( c ) T3 Lock C f5 (f1(a)) f3(b) f4( c ) T1 Unlock B f5 (f1(a)) f2 (f3(b)) f4( c ) T3 Unlock C f5 (f1(a)) f2 (f3(b)) f7 (f4( c )) T3 Unlock A f6(f5 (f1(a))) f2 (f3(b)) f7 (f4( c ))
Chaps19&20-133
CSE 4701
Is this Schedule Serializable? Focus on the Final Line - It indicates the Effective
Order of Execution of Each Transaction for a, b, and c T1 has f1(a) and f2(b) T2 has f3(b) and f4(c) and f5(a) T3 has f6(a) and f7 (c)
For A - Order of Transactions is T1 T2 T3
For B - T2 Must Precede T1 For C - T2 Must Precede T3 Can All Three Conditions be True w.r.t. Order?
A B C
T3 Unlock A f6(f5 (f1(a))) f2 (f3(b)) f7 (f4( c ))
Chaps19&20-134
CSE 4701
Determining Serializability in this Model Examine Schedule Based on Order in Which Various
Transactions Obtain Locks Order must be Equivalent to Some Hypothetical Serial
Schedule of Transactions If Orders for Different Data Items Forces Two
Transactions to Appear in a Different Order(T2 Must Precede T1 and T1 Must Precede T2 )There is a Paradox!
This is Equivalent to Searching for Cycles in a Directed Graph
Chaps19&20-135
CSE 4701
Recall Topological Sort Graph is Acyclic Find a Node of Graph with ONLY Arrows Leaving (no
Entering) Delete Node and Arrows
Chaps19&20-136
CSE 4701
Algorithm 1: Binary Lock Model Input: Schedule S for Transactions T1, T2 , … Tk Output: Determination if S is Serializable, and If so,
an Equivalent Serial Schedule Method: Create a Directed Precedence Graph G:
Let S = a1 ; a2 ; … ; an where each ai is Tj :Lock Am or Tj : Unlock Am
For each ai = Tj : Unlock Am , find next ap = Ts : Lock Am (1 < p n) (Ts is next Trans. to lock Am), and if so, draw Arc in G from Tj to Ts
Repeat Until All Unlock/Lock are Checked Review the Resulting Precedence Graph
If G has Cycles - Non-Serializable If G is Acyclic - Topological Sort to Find an Equivalent
Serial Schedule
Chaps19&20-137
CSE 4701 T1 Lock A
T2 Lock BT2 Lock CT2 Unlock BT1 Lock B T1 Unlock AT2 Lock AT2 Unlock C T2 Unlock A T3 Lock A T3 Lock C T1 Unlock B T3 Unlock CT3 Unlock A
Precedence Graph for Prior Example Look for Unlock Lock Combos on the
Same Data Item T2 Unlock B and T1 Lock B T1 Unlock A and T2 Lock A T2 Unlock C and T3 Lock C T2 Unlock A and T3 Lock A
IS IT SERIALIZABLE?
T1 T2
B
A
T3
A, C
Chaps19&20-138
CSE 4701 T2 Lock A
T2 Unlock AT3 Lock A T3 Unlock AT1 Lock BT1 Unlock B T2 Lock B T2 Unlock B
Another Example Look for Unlock Lock Combos on the
Same Data Item T2 Unlock A and T3 Lock A T1 Unlock B and T2 Lock B
IS IT SERIALIZABE? IF SO WHAT IS THE SCHEDULE?
T1 T2
B
T3
A
Chaps19&20-139
CSE 4701
Two-Phase Protocol Two-Phase Protocol - All Locks Must Precede All
Unlocks in the Schedule for a Transaction Which of the Transactions Below are Two-Phase? Why or Why Not?
T1 Lock ALock BUnlock AUnlock B
T2 Lock BLock CUnlock BLock AUnlock CUnlock A
T3 Lock ALock CUnlock CUnlock A
Chaps19&20-140
CSE 4701
Theorems Regarding Serializability Theorem 1: Algorithm 1 Correctly Determines if a
Schedule S is Serializable (omit the proof). Theorem 2: If S is any Schedule of 2 Phase
Transactions (i.e., all of its Transactions are 2-Phase), then S is Serializable. Proof by Contradiction. Suppose Not - they by Theorem 1, S has a
Precedence Graph G with a Cycle T1 T2 T3 … Tp T1
UNL L UNL UNL L In T1 T2 , T1 is Unlock, so all Remaining Actions
must also be Unlock, since S is 2 Phase However, in Tp T1 , T1 is Lock, which is a
Contradiction to Fact that S is 2 Phase
Chaps19&20-141
CSE 4701
Problems of Binary Locks Only One Transaction Can Hold a Lock on a Given
Item No Shared Reading is Allowed - Too Restrictive For Example
T1 is Read Only on X - Yet Needs Full Lock T2 is Read Only on X and Y - Needs Full Locks
T1 T2
Read(X);
Read(Y) commit;
time
Read(X); Read(Y);
Y = Y + 20;Write(Y);
commit;
t1
t2
t3
t4
t5
Chaps19&20-142
CSE 4701
A Read/Write Lock Model Refines the Granularity of Locking to Differentiate
Between Read and Write Locks Improves Concurrent Access Rlock (Shared): If T has an Rlock A, then Any Other
Transaction can Also Rlock A, but All Transactions are Forbidden from Wlock A until All Transactions with Rlock A issue Ulock A (Multiple Reads)
Wlock (Exclusive): If T has Wlock A, then All Other Transactions are Forbidden to Rlock or Wlock A Until T Ulocks A (Write Implies Reading, Single Write)
Two Schedules are Equivalent if: Produce Same Value for Each Data Item Each Rlock on an Item Occurs in Both Schedules
at a Time When Locked Item has the Same Value
Chaps19&20-143
CSE 4701
Algorithm 2: Read/Write Lock Model Input: Schedule S for Transactions T1, T2 , … Tk Output: Is S Serializable? If so, Serial Schedule Method: Create a Directed Precedence Graph G:
Suppose in S, Ti :Rlock A. If Tj : Wlock A is the Next Transaction to Wlock A (if it
exists) then place an Arc from Ti to Tj.
Repeat for all Ti’s, all Rlocks before Wlock on A! Suppose in S, Ti :Wlock A.
If Tj : Wlock A is the Next Transaction to Wlock A (if it exists) then place an Arc from Ti to Tj.
If Also exists Tm :Rlock A after Ti :Wlock A but before Tj : Wlock A, then Draw an Arc from Ti to Tm.
Review the Resulting Precedence Graph If G has Cycles - Non-Serializable If G is Acyclic - Topological Sort for Serial Schedule
Chaps19&20-144
CSE 4701
Consider the Following Schedule What are the Dependencies Among Transactions?
T1 T2 T3 T4 (1) Wlock A(2) Rlock B(3) Unlock A(4) Rlock A(5) Unlock B(6) Wlock B(7) Rlock A(8) Unlock B(9) Wlock B(10) Unlock A(11) Unlock A(12) Wlock A(13) Unlock B(14) Rlock B(15) Unlock A(16) Unlock B
Chaps19&20-145
CSE 4701
Consider the Following Schedule What is the Precedence Graph G?
T1 T2 T3 T4 (1) Wlock A(2) Rlock B(3) Unlock A(4) Rlock A(5) Unlock B(6) Wlock B(7) Rlock A(8) Unlock B(9) Wlock B(10) Unlock A(11) Unlock A(12) Wlock A(13) Unlock B(14) Rlock B(15) Unlock A(16) Unlock B
Chaps19&20-146
CSE 4701
Precedence Graph What is the Resulting Precedence Graph? Is the Schedule Serializable? Why or Why Not?
T1 T2
T3T4
A:RW
A:RW
B:RW
A:WW
B:WWA:WR
Chaps19&20-147
CSE 4701
A Read-Only/Write-Only Lock Model Revision of the Read/Write Model for Algorithm 2 Refining Our Assumptions
Assume that a Wlock on an Item Does not Mean that the Transaction First Reads the ItemContrary to First Two Models
Example:Read A; Read B; C=A+B; A=A-1; Write A; Write CReads A, B and Writes A,C (No Read on C)
Reformulate Notion of Equivalent Schedules
Chaps19&20-148
CSE 4701
How Does This Model Differ from Alg. 2? Consider the Schedule Segment:
T1 : Wlock A T1 : Ulock A T2 : Wlock A T2 : Ulock A
In Algorithm 2 - T2 : Wlock A Assumes that T2 Reads the Value Written by T1
However, This Need Not be True in the New Model If Between T1 and T2, No Transaction Rlocks A, then
Value Written by is T1 Lost, and T1 Does not Have to Precede T2 in a Schedule w.r.t. A
Chaps19&20-149
CSE 4701
Redefine Serializability Conditions on Serializability Must be Redefined in
Support of the Write-Does-Not-Assume Read Model If in Schedule S, T2 Reads “A” Written by T1, then
T1 Must Precede T2 in any Serial Schedule Equivalent to S
Further, if there is a T3 that Writes “A”, then in any Serial Schedule Equivalent to S, T3 may either Precede T1 or Follow T2, but may not Appear Between T1 and T2
Graphically, we have:T3
A:WRT1
T2T3
T1 T2 T3 T1 T3 T2 T2 T1 T3
T2 T3 T1 T3 T1 T2 T3 T2 T1
Chaps19&20-150
CSE 4701
Augmentation of Precedence Graph In Support of the Write Does Not Imply Read Model,
we must Augment the Precedence Graph: Add an Initial Transaction To that Writes Every
Item, and a Final Transaction Tf that Reads Every Item
When a Transaction T’s Output is Invisible in Tf (I.e., the Value is Lost), Then T is Referred to as a Useless Transaction
Useless Transactions have no Paths from Transaction to Tf
Note: Maintain Same set of Locks (Rlock, Wlock, Ulock) with Different Interpretation on Wlock
Chaps19&20-151
CSE 4701
Intuitive View of Algorithm 3 If T2 Reads Value of “A” Written by T1 , then T2 Must
Precede in any Serial Schedule For WR Combo - Draw an Arc from T1 to T2
Now Consider a T3 that also Writes “A” T3 Must be either Before T1 or After T2 Add in a Pair of Arcs T3 to T1 and T2 to T3 of
Which one Must be Chosen in the Final Precedence Graph
Serializability Occurs if After Choices Made for each “T3” Pair, the Resulting Graph is Acyclic
G is Referred to as a “Polygraph” with Nodes, Arcs, and Alternate Arcs
Chaps19&20-152
CSE 4701
Algorithm 3 Example T1 T2 T3 T4
(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock A
Chaps19&20-153
CSE 4701
Algorithm 3 – Steps 1 to 4 Input: Schedule S for Transactions T1, T2 , … Tk Output: Is S Serializable? If so, Serial Schedule Method: Create a Directed Polygraph Graph P:
1. Augment S with Dummy To (Write Every Item) an Dummy Tf (Read Every Item)
2. Create Initial Polygraph P by Adding Nodes for To, Tf, and Each Ti Transaction , in S
3. Place an Arc from Ti to Tj Whenever Tj Reads A in Augmented S (with Dummy States) that was Last Written by Ti. Repeat this Step for all Arcs.Don’t Forget to Consider Dummy States!
4. Discover Useless Transactions - T is Useless if there is no Path from T to Tf
This is the “Initialization” Phase of Algorithm 3
Chaps19&20-154
CSE 4701
Resulting Polygraph - Steps 1 to 2
T4T3T2T1T0 Tf
1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P
Chaps19&20-155
CSE 4701
Alg 3 Step 3 - Init=T0 & Fin=Tf T1 T2 T3 T4
T0 Write A Write B Write C Write D(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock ATf Read A Read B Read C Read D
Who Reads A after T0 Writes A?
Who Reads A after T4 Writes A?
Who Reads B after T1 Writes B?
Who Reads B after T4 Writes B?
Who Reads C after T1 Writes C?
Who Reads D after T2 Writes D?
Chaps19&20-156
CSE 4701
Step 3 -Write to Reads on A
Chaps19&20-157
CSE 4701
Step 3 - Write to Reads on B
Chaps19&20-158
CSE 4701
Step 3 - Write to Reads on C
Chaps19&20-159
CSE 4701
Step 3 - Write to Reads on D
Chaps19&20-160
CSE 4701
Resulting Polygraph - Steps 1 to 3
T4T3T2T1T0 Tf
1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P 3. Look for Ti Write X to Tj Read X for all Items X 4. Look for Useless Transactions - No Paths from T to Tf
A:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WR
C:WRD:WR
Chaps19&20-161
CSE 4701
Resulting Polygraph - Steps 1-4 1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P 3. Look for Ti Write X to Tj Read X for all Items X 4. For - T3 Remove Arcs Into T3 – This Completes Step 4
T4T3T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
Chaps19&20-162
CSE 4701
Algorithm 3 – Steps 5 to 7 Method: Reassess the Initial Polygraph P:
5. For Each Remaining Arc Ti W to Tj R(meaning that Tj Reads Item A Written by Ti )Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No Arcs
II. If Ti = To and Tj Tf then Add Arc from Tj to T
III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T
6. Determine if P is Acyclic by “Choosing” One Transaction Arc for Each Pair - Make Choices Carefully
7. If Acyclic - Serializable - Perform Topological Sort without To , Tf for Equivalent Serial Schedule. Else - Not Serializable
Chaps19&20-163
CSE 4701
What are Four Cases of Step 5 Conceptually? 5. For Each Remaining Arc Ti W to Tj R
Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No ArcsII. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T
Ti TjX:WR
T0 TfX:WR
General Case:
Case I: no new arc
T0 TjX:WR
Case II: Add Arc to from Ti to TT is after
TII X:RW
Chaps19&20-164
CSE 4701
What are Four Cases of Step 5 Conceptually? 5. For Each Remaining Arc Ti W to Tj R
Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No ArcsII. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T
Ti TjX:WR
Ti TfX:WR
General Case:
Case III: Add Arc from T to Ti – T is before
TIII X:RW
Chaps19&20-165
CSE 4701
What are Four Cases of Step 5 Conceptually? 5. For Each Remaining Arc Ti W to Tj R
Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No ArcsII. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T
Ti TjX:WR
Ti TjX:WR
General Case:
Case IV: Add in two Arcs T is after Tj or before Ti
TIV X:RW
IV X:RW
Chaps19&20-166
CSE 4701
T1 T2 T3 T4 To Write A Write B Write C Write D(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock ATf Read A Read B Read C Read D
Alg 3 Ex - Step 5 - Who Else Writes A?
For T0 to T1 Arc Who Else Writes A?For T0 to T2 Arc
Who Else Writes A?
For T4 to Tf Arc Who Else Writes A?
Chaps19&20-167
CSE 4701
Resulting Polygraph - Step 5 - A:WR
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
T4T3T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
II A:RW
Chaps19&20-168
CSE 4701
Resulting Polygraph - Step 5 - A:WR 5. For Each Arc Ti to Tj Consider All T’s that Write X
I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T Check Items A (see new arcs/labels - case II and
III)
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
II A:RW
Chaps19&20-169
CSE 4701
Alg 3 Ex - Step 5 - Who Else Writes C/D? T1 T2 T3 T4
Init Write A Write B Write C Write D To(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock AFin Read A Read B Read C Read D Tf
For three T1 Arcs Does Anyone Else Write C?
For One T2 Arc Does Anyone Else Write D?
Chaps19&20-170
CSE 4701
Resulting Polygraph-Step 5- C:WR & D:WR 5. For Each Arc Ti to Tj Consider All T’s that Write X
I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T Do any Other Transactions Write C or Write D
for the arrows labeled C:WR and D:WR Respectively?
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
II A:RW
Chaps19&20-171
CSE 4701
Alg 3 Ex - Step 5 - Who Else Writes B? T1 T2 T3 T4
Init Write A Write B Write C Write D(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock AFin Read A Read B Read C Read D
For T4 to Tf Arc Who Else Writes B?T1 but already Arc from T1 to T4
For T1 to T4 Arc Who Else Writes B?Just T4 so no arc For T1 to T2 Arc Who Else Writes B?This is Case IV
T4 Writes B Two Arcs:
T4 after T2 and T4 before T1
Chaps19&20-172
CSE 4701
Two Added Arcs for Case IV and B
IV B:RW
IV B:RW
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
II A:RW
T4 Follows T2 and T4 Before T1
Chaps19&20-173
CSE 4701
Resulting Polygraph - Step 5 and 6 5. For Each Arc Ti to Tj Consider All T’s that Write X
I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti
IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T B (see new arcs - including alternates - dashed)
For T1 to T2, T4 writes - so add T2 to T4 and T4 to T1 – Case IV
Either T4 After T2 or Before T1 - no new arcs for other WRs.
IV B:RW
IV B:RW
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
II A:RW
Chaps19&20-174
CSE 4701
Resulting Polygraph - Step 5 and 6 6. Which Option of Pair of Arcs Should be Chosen? Why?
IV B:RW
IV B:RW
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
II A:RW
Chaps19&20-175
CSE 4701
Final Polygraph - Step 7 Final Graph with Are Removed Delete Dummy States below
Topological Sort Yields Order: T1 , T2 , T3 , T4
IV B:RW
T4T2T1T0 TfA:WR
A:WR
A:WR
B:WR
B:WR B:WR
C:WR
C:WRD:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
IV B:RW
T4T2T1
B:WR
B:WR
C:WR
T3
II A:RW
II A:RW
III A:RW
II A:RW
II A:RW
II A:RW
Chaps19&20-176
CSE 4701
Why Optimistic Concurrency Control? Motivate by Disadvantages of Locking Techniques
Lock Maintenance Deadlock-Free Locking Protocols Limit
Concurrency Secondary Memory Access Causes Locks to be
Held for a Long Duration Locks Typically Held Until Transaction
Completes, Which Reduces Concurrency Often Needed in “Worst” Case Only Overhead - Locking + Deadlock Detection
Key Concept Write Collisions in Large Databases for “Many”
Applications are Rare OCC: “Don’t Worry be Happy” Approach
Chaps19&20-177
CSE 4701
Basic Ideas of OCC Interference Between Transactions is Rare and
Locking Incurs too Much Overhead Instead, Allow Each Transaction to Execute Freely,
and Check Serializability at the end of the Transaction Win (Allow to Commit) If No Interference Occurs or
There have been No Conflicts
Pessimistic execution
Optimistic execution
Validate Read(and Compute)
Write
ValidateRead Write(and Compute)
Chaps19&20-178
CSE 4701
How Does OCC Work? Execute Transactions Ad-Hoc - Let them Go
Uncontrolled Maintain Information of “Relevant” Actions Against
DB (Often in Conjunction with Recovery/Journal) When Transactions Finish - Check to see if Everything
Proceeded Satisfactorily Assumes that Probability of Transaction Interference
is Quite Small Two Questions re. OCC:
How Do We know Everything Went OK? How do we Recover if it Didn’t?
Chaps19&20-179
CSE 4701
What is a Timestamp? Timestamp
A monotonically increasing variable (integer) indicating the age of an operation or a transaction.
A larger timestamp value indicates a more recent event or operation.
Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions.
Chaps19&20-180
CSE 4701
OCC Utilizes Timestamps Timestamps are Clock Ticks used to Record the Major
Milestones in the Execution of a Transaction Examples Include:
Start Time of Transaction Read/Write Times for DB Items Finish Time of Transaction Commit Time of Transaction
Two Important Definitions are: Read Time of an Item: Highest Time Stamp
Possessed by Any Transaction that Reads the Item Write Time of an Item: Highest Time Stamp
Possessed by Any Transaction that Wrote the Item A Transaction has a Fixed Time when it Started that is
Constant Throughout its Execution
Chaps19&20-181
CSE 4701
How are Timestamps Used? Focus on “When” Reads and Writes Occur Transaction Cannot Read an Item if its Value was Not
Written Until After the Transaction Finished its Execution Transaction T with Timestamp t1 Cannot Read an
Item with a Write Time of t2 if t2 > t1 If this is the Case, T Must Abort and be Restarted Can’t Read Item if it hasn’t been Written
Transaction Cannot Write an Item if that Item has its Old Value Read at a Later Time Transaction T with Timestamp t1 Cannot Write an
Item with a Read Time of t2 if t2 > t1 If this is the Case, T Must Abort and be Restarted Can’t Write Item Being Read at a Later Time
Chaps19&20-182
CSE 4701
Algorithm 4: Optimistic CC Let T be a Transaction with Timestamp t Attempting to
Perform Operation X on a Data Item I with Readtime tR and Writetime tW If (X = Read and t tW ) or
(X = Write and t tR ) then Perform Operation If t > tR then set tR = t for Data Item I (read after write)
If t > tW then set tW = t for Data Item I (write after read) If (X = Write and tR t < tW ) then Do Nothing since
Later Write will Cancel out the Write of T If (X = Read and t < tW ) or
(X = Write and t < tR ) then Abort the Operation 1st - T trying to Read Item Before it was Written 2nd - T trying to Write an Item Before it was Read
Chaps19&20-183
CSE 4701
T1 T2 T3 A B C
200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0
(1) Read B
(2) Read A
(3) Read C
(4) Write B
(5) Write A
Example of OCC
What Happens at Each Step w.r.t. RT/WT?
RT=0 RT=200 RT=0WT=0 WT=0 WT=0
RT=150 RT=200 RT=0WT=0 WT=0 WT=0
RT=150 RT=200 RT=175WT=0 WT=0 WT=0
RT=150 RT=200 RT=175WT=0 WT=200 WT=0
RT=150 RT=200 RT=175WT=200 WT=200 WT=0
Chaps19&20-184
CSE 4701
T1 T2 T3 A B C
200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0
(1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0
(2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0
(3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0
(4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0
(5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0
(6) Write C
Example of OCC
What Happens at Step 6? WT(C) =150 < RT(C)=175 Trying to write C after its Read - Consequence - Abort T2
RT=150 RT=200 RT=175WT=200 WT=200 WT=0
Chaps19&20-185
CSE 4701
T1 T2 T3 A B C
200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0
(1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0
(2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0
(3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0
(4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0
(5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0
(6) RT=150 RT=200 RT=175 Write C WT=200 WT=200 WT=0
(7) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0
Example of OCC
Step (7) T3 can Finish, but No Effect Since 175 < 200 - Discard
Chaps19&20-186
CSE 4701
T1T2 T3 A B C
200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0
(1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0
(2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0
(3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0
(4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0
(5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0
(6) RT=150 RT=200 RT=175 Write C WT=200 WT=200 WT=0
(7) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0
Summary of Example T1 Completes Successfully; T2 Aborts;
T3 Completes but Doesn’t Write A
Chaps19&20-187
CSE 4701
Recovery Consideration Actual Write Operations of Previous Example are
Phase 1 of Two-Phase Commit (Write to Journal) Commit - Phase 2 - Writes to DB Between Write to Log and Write to DB, No Other
Transaction is Allowed to Read Items being Written OCC Reduces Work as Follows:
One Step for Read, Two for Writes (write/commit) In Locking, we had Four Steps for R or W:
Lock, Read or Write, Unlock, Commit
Chaps19&20-188
CSE 4701
Viewing OCC vs. Phases of Execution Read Phase:
Database Information Read from Secondary Storage into Primary Memory
All Writes are to Local Workspace Validate Phase:
Check to see if Integrity of Data has not been Violated
Write Phase: Update the DB (Secondary Storage) from Local
Copies
Optimistic execution
ValidateRead Write(and Compute)
Chaps19&20-189
CSE 4701
Contrasting PCC and OCC Transaction Control
PCC: Control by Having Transactions Wait OCC: Control by Having Transactions Backed up
Serializability PCC: Ordering of Data Items OCC: Ordering of Transactions
Biggest Potential Problem PCC: Deadlock, rather Preventing it OCC: Starvation
Different Applications Suited to Different Approaches Some DBMS Support Both DBA Can Configure on Application-by-
Application Basis
Chaps19&20-190
CSE 4701
Concluding Remarks Background
OS Concepts of Sharing and Synchronization Deadlock Detection, Prevention, Avoidance
Chapter 19 Transaction Processing Concepts Different Problems re. Concurrency Control
Deadlock, Livelock, Starvation Lost Update, Dirty Read, etc. Serial Schedule and Serializability
Chapter 20 Deviated from Textbook Notation 3 Pessimistic Locking Based CC Algorithms 1 Optimistic Timestamp Based CC Algorithm Role of Recovery in CC
Recommended