190
CSE 4701 Chapter 14- 1 Slides on Normalization

CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Embed Size (px)

Citation preview

Page 1: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-1

Slides on Normalization

Page 2: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-2

Towards Normalization of Relations We take each Relation Individually and “Improve”

Them in Terms of the Desired Characteristics Normalization Decomposes Relations into Smaller

Relations that Results in No Information Loss Support for Reconstruction

No Spurious Joins Query Execution Time May Increase

Denormalization May Be Necessary Later on Objectives: Minimizing

Redundancy Insertion, Deletion, and Update Anomalies

Page 3: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-3

What is the Normalization Process?

Provides DB Designers with the Ability to “Improve” their Relations

Deal with Redundancies and Anomalies Normalization Procedure Provides DB Designs with

A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes

A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree

Page 4: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-4

What are Normal Forms?

A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria Primary keys (1NF, 2NF, 3NF) All Candidate Keys ( 2NF, 3NF, BCNF) Multivalued Dependencies (4NF) - Chapter 15 Join Dependencies (5NF) - Chapter 15

5 NF4NF

3NF

2NF

1NF

Page 5: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-5

How is Normalization Attained?

Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies

In Process, we must Maintain Two Properties: Lossless Join or Nonadditive Join Property

Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations

Dependency Preservation PropertyEnsures that each FD is Represented in some Individual Relation(s) after Decomposition

Premise: Relational Schema with Primary Keys and Functional Dependencies Specified

Page 6: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-6

Recall Key Constraints

Superkey (SK): Any Subset of Attributes Whose Values are

Guaranteed to Distinguish Among Tuples Candidate Key (CK):

A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity)

A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple

There may be Multiple Candidate Keys

Page 7: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-7

Recall Key Constraints

Primary Key (PK): Choose One From Candidate Keys The Primary Key Attributed are Underlined

Foreign Key (FK): An Attribute or a Combination of Attributes (Say A)

of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain)

Allows Linkages Between Relations that are Tracked and Establish Dependencies

Useful to Capture ER Relationships

Page 8: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-8

Superkeys vs. Candidate Keys

Superkey of R: A Superkey SK is a Set of Attributes of R Such that

No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK

Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T1 and T2 in R(r), T1[sk] < > T2[sk]

For Cars, Valid Superkeys Must Contain:SerialNo OR State, Reg# OR Both

For EMPLOYEE {SSN} is a Key and{SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are

all SUPERKEYS

Page 9: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-9

Superkeys vs. Candidate Keys

Candidate Key of R: A "Minimal" Superkey: a Candidate Key K is a

Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey

Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T1 and T2 in R(r) such that T1[K-A] = T2[K-A]

In Previous (State, Reg#, Make, Model) is SKIs it a CK?Why or Why Not?

Page 10: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-10

Example and Remaining Definitions

Example: CAR(State, Reg#, SerialNo, Make, Model, Year) Primary key is {State, Reg#} It has two candidate keys (also superkeys)

Key1 = {State, Reg#} Key2 = {SerialNo}

{SerialNo} can also be Chosen as Primary Key Definition: Prime Attribute - Attribute A of R that is

Member of some Candidate Key K or R Definition: Non-Prime Attribute - An Attribute that is

not Prime (i.e., Not a Member of Any Candidate Key) WORKS_ON – SSN, Pnumber PRIME

Page 11: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-11

First Normal Form (1NF)

All Attributes Must Be Atomic Values: Only Simple and Indivisible Values in the Domain

of Attributes. Each Attribute in a 1NF Relation is a Single Value Disallows Composite Attributes, Multivalued

Attributes, and Nested Relations (Non-Atomic) 1NF Relation cannot have an Attribute Value :

A Set of Values (Set-Value) A Tuple of Values (Nested Relation)

1NF is a Standard Assumption of Relation DBs

Page 12: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-12

One Example of 1NF

Consider Following Department Relation What is the Inherent Problem?

DLOCATIONS is Multi-valued

Page 13: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-13

What are Possible Solutions?

Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION)

Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below)

Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations

Problems with Each? Best Solution?

Page 14: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-14

Another 1NF Example - Nested Relations

EMP_PROJ - Table and Tuples

Transition to:

Page 15: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-15

Second Normal Form (2NF)

Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies

Intuitively: A Relation Schema R is in Second Normal Form

(2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key

R can be Decomposed into 2NF Relations via the Process of 2NF Normalization

Successful Process Typically Involves Decomposing R into Two or More Relations

Iteratively Applying to Each Relation in Schema

Page 16: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-16

Full Functional Dependency

Full FD - Formally:Given R(U) and X, YU. If XY holds, and there exists no such X’ that X’X, and X’Y holds over R, then Y is fully dependent on X, denoted as XY

Full FD- Intuitively: A FD XY where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER} HOURS is full since Neither

SSN -> HOURS nor PNUMBER HOURS holds What about in the Following:

f

{S#, CN}Grade

Page 17: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-17

Partial Functional Dependency

Partial FD - Formally:Given R(U) and X, YU. If XY holds but Y is not fully dependent on X ( XY), then Y is partially functional dependent on X, denoted by XY

Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER} ENAME is Partial since

Removing PNUMBER still Results in the Valid FD SSN ENAME

Are Following Full or Partial?

p

{S#, CN}CN, {S#, CN}S#

{S#, CN, DNAME}Grade

f

Page 18: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-18

Second Normal Form (2NF)

Formal 2NF Definition R 2NF iff (i) R 1NF; (ii) all Non-Key Attributes in R are Fully

Functional Dependent on Every Key. Alternative Definition:

R 2NF iff the Attributes are Either a Candidate Key, or Fully Dependent on Every Key.

Reason: Partial Functional Dependencies may cause Update Problems

Page 19: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-19

Another Way to View the Problem If the Primary Key Contains a Single Attribute, than No

Need to Test for Problems This is 1NF but not 2NF since

Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN)

Pname and Ploc two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber)

Page 20: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-20

One Example of 2NF

Consider the Example Below

STUDENT_DEPT(S#, DName, DHead, CN, Grade)

STUDENT_DEPT 1NF

“{S#, CN} DName, DHead” since S# DName and DName DHead is a Partial FD causes Anomalies

But STUDENT_DEPT 2NF

S# DHead CN GradeDName

fd1

fd2

fd3

Page 21: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-21

Recall the Anomalies…

Insertion Anomalies: No Department Can Be Recorded if it has No

Student Who Enrolls Courses Deletion Anomalies:

Delete the Last Student in a Department will also Delete the Department

Update Anomalies: Change a Head of a Department must Modify All

Students in that Department Due to Redundancies

STUDENT_DEPT(S#, DName, DHead, CN, Grade)

Page 22: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-22

One Example of 2NF (Continued)

Decomposition into 2NF by Separating Course Information from Department Information (Link S#)

S_D(S#, DName, DHead)

DHeadDName

fd2

fd3

S#

S_C(S#, CN, Grade)

fd1

S# CN Grade

Page 23: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-23

Another Example of 2NF

EMP_PROJ is 1NF with Key SSN, PNUMBER but… SSN ENAME - Means ENAME, a Non-Prime

Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both

PNUMBER {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both

Page 24: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-24

Another Example of 2NF

What Does Decomposition Below Accomplish? ENAME Fully Dependent on SSN PNAME, PLOC Fully Dependent on PNUMBER

Result: 2NF for EP1, EP2, and EP3

Page 25: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-25

Yet Another Example of 2NF

Consider 1NF Lots to Track Building Lots for Towns What is the 2NF Problem?

FD3: COUNTY_NAME TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#}

All Other Non-Prime Attributes are Fine

Page 26: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-26

Yet Another Example of 2NF

What Does Decomposition Below Accomplish? TAX_RATE Fully Dependent on COUNTY_NAME

Result: 2NF for LOTS1 and LOTS2

Page 27: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-27

Third Normal Form (3NF)

Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies

Intuitively: A Relation Schema R is in Third Normal Form

(3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key

R can be Decomposed into 3NF Relations via the Process of 3NF Normalization

In XY and Y Z , with X as the Primary Key, there is only a a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN Emp# Salary isn’t Problem Since Emp# is a Candidate Key

Page 28: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-28

Transitive Partial FDs

Transitive FD - Formally: Given R(U) and X, YU. If XY, YX and YX, YZ, then Z is called transitively functional dependent on X.

Transitive FD - Intuitively: a FD X Z that can be derived from two FDs XY and YZ SSN ENAME is non-transitive Since there is no set of

Attributes X where SSN X and X ENAME For FD X Z that can be derived from two FDs XY

and YZ, if Y is a Candidate Key – No Problem

Page 29: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-29

Third Normal Form (3NF)

Formal 3NF Definition R 3NF iff

(i) R 2NF;

(ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key.

Alternative Definition: R 3NF iff for every FD X Y, either X is a superkey, or Y is a key attribute.

Reason: Transitive Functional Dependencies may cause Update Problems

Page 30: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-30

One Example of 3NFSTUDENT_DEPT(S#, DName, DHead, CN, Grade) 2NF

S_C(S#, CN, Grade) 2NFS_D(S#, DName, DHead) 2NF S_D 3NF

S_C 3NF

“S# DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X) Dname (Y) and DName (Y) DHead (Z)

S#DHead

S# DHead CN GradeDNAME

fd1

fd2

fd3

Page 31: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-31

One Example of 3NF

S_C(S#, CN, Grade) 2NF

S_D(S#, DName, DHead) 2NF

S_D (S#, DName)

DEPT(DName, DHead)3NF

fd2 S# DName

fd3 DName DHead

DHeadDNameS#fd S# DHead

Decompose to Eliminate the Transitivity Within S_D

Page 32: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-32

Another Example of 3NF

EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN DNUMBER and DNUMBER DNAME

Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN

SSN is the Only Candidate Key of EMP_DEPT! Note: Also Similar Problem with SSN and

DMGRSSN via DNUMBER

Page 33: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-33

Another Example of 3NF

To Attain 3NF, Decompose into ED1 and ED2 Intuitively - we are Separating Out Employees and

Departments from One Another

Page 34: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-34

Yet Another Example of 3NF

Recall 2NF Solution for Building Lots Problem What is the 3NF Problem? Violate Alternative Defn.

In LOTS1, FD4 AREA PRICEAREA is not a SuperkeyPRICE not a Prime Attribute of LOTS1

Page 35: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-35

Yet Another Example of 3NF

Decompose to Introduce a Separate Key AREA Result: 3NF for LOTS1A and LOTS1B

Page 36: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-36

1NF and 2NF – Maintain FDs!

Page 37: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-37

Transition to 3NF – Maintain FDs!

Page 38: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-38

Summary of Progression – Maintain FDs!STUDENT_DEPT

1NF

S# DHead CN GradeDName

fd1

fd2

fd3

S_C S_D2NF

eliminate partial FDs

fd1

S# CN Grade DHeadDName

fd2

fd3

S#

DHead

S#S_D

DName

DEPT

S_C

3NF

eliminate transitive FDs

fd1

S# CN Grade

DName

fd3

fd2

Page 39: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-39

Summary of 1NF, 2NF, 3NF ConceptsTest Remedy (Normalization)

1NF Relation should have Form new relations for each nonatomic no nonatomic attributes attribute or nested relation. or nested relations.

2NF For relations where primary Decompose and set up a new relation key contains multiple for each partial key with its dependent attributes, no nonkey attribute(s). Make sure to keep a attribute should be relation with the original primary key functionally dependent on and any attributes that are fully a part of the primary key. functionally dependent on it.

3NF Relation should not have a Decompose and set up a relation that nonkey attribute functionally includes the nonkey attribute(s) that determined by another nonkey functionally determine(s) other attribute (or by a set of nonkey nonkey attribute(s). attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.

Page 40: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-40

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs

Intuitively: A Relation Schema R is in Boyce-Codd Normal

Form (BCNF) if Whenever an FD X A Holds in R, then X is a Superkey of R

R can be Decomposed into BCNF Relations via the Process of BCNF Normalization

There exist Relations that are in 3NF but not in BCNF The Goal is to have each Relation in BCNF (or 3NF)

Page 41: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-41

Boyce-Codd Normal Form (BCNF)

Formal BCNF Definition R BCNF iff

(i) R 1NF;

(ii) for every FD X Y, X is a Superkey, i.e., if X Y and YX, then X Contains a Key.

Properties of BCNF R BCNF iff for every FD X Y, either All Non-key Attributes Fully Dependent on Every Key All Key Attributes Fully Dependent on the Keys that

they do not Belong to No Attribute Fully Dependent on any Set of Non-key

Attributes

Page 42: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-42

Comparing the Normal Forms

1NF

2NF

3NF

BCNF

Eliminate the non-trivial functional

dependencies of non-key

attributes to key

Eliminate partial FDs of non-key attributes to key

Eliminate transitive FDs of non-key attributes to key

Eliminate partial and transitive FDs of key attributes to key

Poor Relational Schema DesignDeveloped as Stepping Stone

Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies

Page 43: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-43

One Example of BCNF

Recall 3NF Solution for Building Lots Problem Suppose that AREA is Sizes in Acres with

AREAs in Tolland County 0.5, 0.6, …, 1.0 AREAs in Windham County 1.1, 1.2, …, 2.0

Adding FD5: “AREA COUNTYNAME” What Does Data in LOTS1A Look like for Given Set

of Properties?

Page 44: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-44

LOTS1A PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5 T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6 T10 Tolland L3 0.9

One Example of BCNF

What is the Problem Here? What if you Delete W11? You have “Lost” the “Windham, 1.1” Combination

Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A

Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA COUNTY_NAME”

Page 45: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-45

Transition to BCNF – Maintain FDs!

Add new FD5

Page 46: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-46

One Example of BCNF

FD5: “AREA COUNTY_NAME” Satisfies 3NF: COUNTY_NAME is Prime Attribute Violates BCNF: AREA not a SuperKey of LOTS1A

So Do One More Split

Page 47: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-47

One Example of BCNF

LOTS1AX PROPERTY_ID# LOT# AREAT11 L1 0.5 T12 L2 0.8 W13 L6 1.5 W11 L1 1.1W12 L4 1.6 T10 L3 0.9

LOTS1AX PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6T10 Tolland L3 0.9

LOTS1AY AREA COUNTY_NAME0.5 Tolland... Tolland1.0 Tolland1.1 Windham... Windham2.0 Windham

Page 48: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-48

Consider the TEACH Relation:

in 3NF but NOT BCNF with FD1: {STUDENT, COURSE} INSTRUCTOR FD2: INSTRUCTOR COURSE

3 Possible Decompositions of TEACH: T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT)

All Three “Lose” FD1! 3rd is Best Since After Join, Recaptures FD1 and

Doesn’t Generate any Spurious Tuples

TEACH(STUDENT, COURSE, INSTRUCTOR)

Another Example of BCNF

Page 49: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-49

What Does Table Look Like?

Note TEACH in 3NF but NOT BCNF

Page 50: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-50

Reflections on Normalization

Normalization A Tool for Validating the Quality of the Schema,

Rather than Merely as a Method for Designing a Relational Schema

Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema

Normalization Process Actually a Process of Concept Separation Concept Separation is Result of Applying a Top-

down Methodology for Producing a Schema Via Subsequent Refinements and Decompositions

Page 51: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-51

Relational DB Design Process

Normalization Process Focused on Decomposition Raises Number of Questions

How do we Decompose a Schema into a Desirable Normal Form?

What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema?

Can we Guarantee the Decomposition’s Quality? Can we Prevent the “Loss” of Information? Are Dependencies Maintained in Decomposition?

Page 52: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-52

S# DName DHeadR = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }

S1S2S3S4

D1D1D2D3

JohnJonhSmithBlack

Recall Transitive FD/Update Anomalies

S# Dhead” is a Transitive FD When S4 Graduates, Head Information of D3 Lost Similarly, If D5 has No Students Yet, then the Head

Information cannot be Stored in this Database Update Head of Any Department Requires an

Update to Every Student Enrolled in the Dept.

Page 53: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-53

What are Possible Decompositions?

S#

S1S2S3S4

D1D1D2D3

DHead

JohnJohnSmithBlack

DName

Information Based

R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }

= { R1(S#, ), R2(DName, R3(DHead, )}

is Neither Lossless nor FD-Preserving

Page 54: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-54

What are Possible Decompositions?

S# DName

S1S2S3S4

D1D1D2D3

S# DHead

S1S2S3S4

JohnJohnSmithBlack

• Lossless Decomposition but not Dependency-Preserving

• DNameDHead is lost in the decomposition

R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }

= { R1({S# ,DName}, {S#DName}),

R2({S#, DHead}, {S#DHead})}

2is Lossless but not FD-Preserving

Page 55: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-55

What are Possible Decompositions?

S# DName

S1S2S3S4

D1D1D2D3

DName DHead

JohnJohn

D1D1D2D3

Lossless & dependency-preserving decomposition

R = ( U, F ) U = { S#, DName, DHead }F = { S#DName, DName DHead }

= { R1({S# ,DName}, {S# DName})

R3({DName, DHead}, {Dname DHead})}

is both Lossless and FD-Preserving

Page 56: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-56

Summary of Normalization

2NF

3NF

BCNF

1NF

Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes

Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key

Lossless Decompositionbut not Dependency Preserving

Lossless Decompositionand Dependency Preserving

Page 57: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-57

The Entire Normalization Picture1NF

2NF

3NF

BCNF

Eliminate Partial FDs of Non-prime Attributes to Key

Eliminate Transitive FDs of Non-prime Attributes to Key

Eliminate Partial and Transitive FDs of Prime Attributes to Key

4NF

Eliminate Non-trivial and Non-functional Multi-Valued Dependencies

5NF

Eliminate Join Dependencies that are Not Implied by Candidate Key

Page 58: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-58

What are Multi-Valued Dependencies?

Focused on the Concept of Multi-Valued Dependencies A MVD X Y Indicates that a Value of X

Corresponds to Multiple Values of Y Consider EMP with MVDs:

ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)

Page 59: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-59

What is Fourth Normal Form (4NF)?

A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X Y in F+, X is a Superkey for R

Reconsider EMP with MVDs: ENAME PNAME (E works on many P) ENAME DNAME (E has many Dependents)

ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish

We need to Decompose EMP!

Page 60: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-60

Decomposition into 4NF

ENAME PNAME is Trivial MVD: ENAME PNAME is

Equal to EMP_PROJECTS (same for ENAME DNAME)

Page 61: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-61

What about the Supply Table?

In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects

Removes Join Dependencies – Many-many-many

Page 62: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-62

Slides on Query Optimization

Page 63: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-63

CSE4701

Query Optimization Objectives Improving Performance Arriving at a Query Plan of Execution Analyzing the Relational Algebra Query

Replace Costly Operations Do Selections and Projections Early

Optimization Heuristics for the Relational Algebra Performing Selection and Projection Before Join Combining Several Selections Over a Single

Relation Into One Selection Find Common Subexpressions Algebraic Rewriting/transformation Rules

General Transformation Rules for Relational Algebra (Equivalence-preserving Algebraic Rewriting Rules)

Page 64: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-64

CSE4701

Why is it important?

SELECTENAMEFROM E,WWHERE E.ENO = W.ENO AND W.RESP = "Manager"

Strategy 1 ENAME(RESP="Manager"E.ENO=G.ENO(E W))

Strategy 2 ENAME( E ENO(RESP="Manager"(W)))

Query Optimization: An Example

Page 65: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-65

CSE4701

Assume : card(E) = 4,000; card(W)=10,000 10% of tuples in W satisfy RESP="Manager"

(selection generates 1,000 tuples) Execution time Proportional to the Sum of the

Cardinalities of the Temporary Relations Searching is Done by Sequential Scanning

Strategy 1 Strategy 2Cartesian prod. = 40,000,000 Selection over W = 10,000Search over all = 40,000,000 Join(4000*1000) = 4,000,000

80,000,000 4,010,000

Cost of Alternatives

Page 66: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-66

CSE4701

General Query Optimization Strategy Perform Selections Early

Yields Smaller Intermediate Results Direct Impact on Subsequent Join/Cartesian Prod.

Combine Selections with a Prior Cartesian Product into a Theta or Equi Join Join is a Cheaper Operation

Combine (Cascade) Selections and Projections

AB(B (R)) AB(R)

p1 ( p2 (R)) p1 ^ p2 (R)

This Results in One Pass Instead of Two over Table

Page 67: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-67

CSE4701

General Query Optimization Strategy Identify Common Subexpressions

Compute Once and Store use Stored Version for Subsequent Times Often Useful When Views are Employed

Preprocess Data via Sorts and Indexes Speeds up Searches and Joins by Limiting Scope

Evaluate and Assess Different Options For Cartesian Product, Use Smaller Relation for

Comparison Use System Catalog (Meta-data) to Effect Order in

Query Execution Plan

Page 68: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-68

CSE4701

Relational Algebra Transformations

1. Cascade of Selection

p1 ^ p2 ^ …^ pn(R)p1

(p2(...(pn

(R))...))

2. Commutativity of Selection

p1(p2

(R))p2(p1

(R))

p1 orp2(R )p1

(R p2(R)

3. Cascade of Projection

A1,A2, … An(R)A1(A2(...(An(R))...))

A1(R) if A1 A2 ... An4. Commuting Selection with Projection (A’s not in p)

A1,A2,...,An(p(R))p(A1,A2,...,An(R)

Page 69: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-69

CSE4701

Relational Algebra Transformations

5. Commutativity of Theta Join and Cartesian Product R A SS A R R SS R

6. Commuting Selection with Theta Join (Cartesian) p(A)(R S) p(A)(R)) S

A defined on R only p(A)^p(B)(R S) p(A)(R)) (p(B)(S))

(A defined on R, B defined on S) Also Holds for Theta Join as Well

7. Commuting Projection with Theta Join (Cartesian) C(R S) A(R) B(S) where AB=C A are Attributes in C for R and B are Attributes in

C for S

Page 70: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-70

CSE4701

Relational Algebra Transformations

8. Commutativity of Set Operations R S S R R S S R

9. Associativity of Set Operations (R S) T R S T) (R S) T R (S T) (R S) S R (S T) (R S) S R (S T)

10. Commuting Select with Set Operations

p(Ai)(R T) p(Ai)(R) p(Ai)(T)

where Ai is defined on both R and T

p(Ai)(R T) p(Ai)(R) p(Ai)(T)

where Ai is defined on both R and T

Page 71: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-71

CSE4701

11. Commuting Projection with Union

C(R q(Aj,Bk) S) A(R) q(Aj,Bk) B(S)

C(R S) A’ (R) B’ (S)

where R[A] and S[B]

C = A' B' where A' A, B’ B12. Converting Selection/Cartesian Into Theta Join

C (R S) R S

Relational Algebra Transformations

C

Page 72: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-72

CSE4701

Using Heuristics in Query Optimization

Process for heuristics optimization1. The parser of a high-level query generates an initial

internal representation;2. Apply heuristics rules to optimize the internal

representation.3. A query execution plan is generated to execute

groups of operations based on the access paths available on the files involved in the query.

The main heuristic is to apply first the operations that reduce size of intermediate results E.g., Apply SELECT and PROJECT operations

before applying the JOIN or other operations.

Page 73: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-73

CSE4701

Using Heuristics in Query Optimization (2) Query tree:

A tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes.

An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation.

Query graph: A graph data structure that corresponds to a relational

calculus expression. It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.

Page 74: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-74

CSE4701

Using Heuristics in Query Optimization

Heuristic Optimization of Query Trees: The same query could correspond to many different

relational algebra expressions — and hence many different query trees.

The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.

Example:Q: SELECT LNAME

FROM EMPLOYEE, WORKS_ON, PROJECTWHERE PNAME = ‘AQUARIUS’ AND

PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’;

Page 75: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-75

CSE4701

Heuristics Algebraic Optimization Concepts Using Cascade of Selections Rule, Break up Any

Selections With Conjunctive Conditions Into a Cascade of Selections Allows More Freedom in Moving Selections

Down Different Branches of the Tree Using Commutativity of Selections with Other

Operations Rules, Move Each Selection Down the Query Tree as far as Possible

If Possible, Combine a Cartesian Product With a Selection Into a Join

Page 76: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-76

CSE4701

Heuristics Algebraic Optimization Concepts Using Associativity of Binary Operations, Rearrange

the Leaf Nodes So That the Most Restrictive Selections Are Executed First The Fewer Tuples the Resulting Relation Contains,

the More Restrictive the Selection Reducing the Size of Intermediate Results

Improves Performance Using Cascade of Projections and Commutativity of

Projections with Other Operations, Move Projections Down the Query Tree as Far as Possible

Identify Subtrees that Represent Groups of Operations that can be Executed by a Single Algorithm

Page 77: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-77

CSE4701

Heuristic Algebraic Optimization Algorithm Use Rule 1 to Break up Selects with Conjunctions into

a Cascade to Move them Down the Query Tree Use Rules 2, 4, 6, and 10 to Commute Select with

Project, Join, Cart. Prod., Union, and Intersection Use Rule 5 (Commute) and 9 (Associative) to

Rearrange the Leaf Nodes of Query Tree to: Most Restrictive Select Executed First Avoid Cartesian Product in Leaf Nodes

Use Rule 12 to Convert a Select/Cart Prod to Join Use Rules 3, 4, 7, and 11 to Cascade and Commute

Project - Pushing Down Tree as Far as Possible Identify Subtrees that Can Execute as Independent

Algorithms (Set of Operations)

Page 78: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-78

CSE4701 ENAME

(DUR=12 OR DUR=24) AND

JNAME=“CAD/CAM” AND

ENAME= “J. DOE”

JNO

ENOP

W E

Canonical query tree at the end of query preprocessing phase

E(ENAME, ENO)P(JNO,JNAME)

W(ENO,PNO,DUR)

Heuristic Optimization: Example

Page 79: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-79

CSE4701

ENAME

DUR=12 OR DUR=24

JNAME=“CAD/CAM”

ENAME = “J. DOE”

JNO

ENOP

W E

Use cascading of selectionsrule to decompose selections

Heuristic Optimization– Example

Page 80: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-80

CSE4701

E

ENAME = "J. Doe"

JNO

ENO

P W

ENAME

DUR=12 OR DUR=24

JNAME=“CAD/CAM” Push selection downusing commutativity of selection over join

Heuristic Optimization– Example

Page 81: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-81

CSE4701

P

JNO

JNAME = "CAD/CAM"

E

ENAME = "J. Doe"

ENO

W

ENAME

DUR=12 OR DUR=24 Push selection downusing commutativity of selection over join

Heuristic Optimization–Example

Page 82: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-82

CSE4701

E

ENAME

ENAME = "J. Doe"

WP

JNO

ENO

JNAME = "CAD/CAM" DUR =12 DUR=24

Push selection down

Heuristic Optimization–Example

Page 83: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-83

CSE4701

E

ENAME

ENAME = "J. Doe"

WP

JNO

JNO,ENAME

ENO

JNAME = "CAD/CAM"

JNO

DUR =12 DUR=24

JNO,ENO

JNO,ENAMEDo early projection

Heuristic Optimization–Example

Page 84: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-84

CSE4701

E

ENAME

ENAME = "J. Doe"

W

P

JNO

JNO,ENAME

ENO

JNAME = "CAD/CAM"

JNO

DUR =12 DUR=24

JNO,ENO

JNO,ENAME

Identify subtrees thatcan be implemented in one algorithm

Heuristic Optimization–Example

Page 85: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-85

CSE4701

BOOKS(Title, Author, Pname, LC_No)PUBLISHERS(Pname, Paddr, Pcity)BORROWERS(Name, Addr, City, Card_No)LOANS(Card_No, LC_No, Date)

Let XLOANS = S(F(Loans x Borrowers x Books))where:S ={Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date}andF = {Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No}

Heuristic Optimization: A Second Example

Page 86: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-86

CSE4701

XLOANS

Books

Loans Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Heuristic Optimization: A Second Example

Page 87: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-87

CSE4701

Query= TITLE(Date 1/1/88 (XLOANS))

Books

Loans Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Title

Date 1/1/88

Heuristic Optimization: A Second Example

Page 88: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-88

CSE4701

Books

Loans Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Title

Date 1/1/88

Date 1/1/88

Try to Cascade

Heuristic Optimization: A Second Example

Page 89: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-89

CSE4701

Books

Loans Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Title

Date 1/1/88

Commute Selectand Project

Heuristic Optimization: A Second Example

Page 90: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-90

CSE4701

Books

Loans Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Title

Date 1/1/88

Commute Selectand Select

Heuristic Optimization: A Second Example

Page 91: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-91

CSE4701

Books

Loans

Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Title

Date 1/1/88

Commute Select andCartesian ProductTwo Levels Down

Heuristic Optimization: A Second Example

Page 92: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-92

CSE4701

Books

Loans

Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No ^Books.LC_No = Loans.LC_No

X

X

Title

Date 1/1/88

Try to CascadeBooks.LC_No = Loans.LC_No

Heuristic Optimization: A Second Example

Page 93: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-93

CSE4701

Books

Loans

Borrower

Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date

Borrower.Card_No = Loans.Card_No

X

X

Title

Date 1/1/88

Commute Select andCartesian ProductOne Level Down

Books.LC_No = Loans.LC_No

What’s Next?

Heuristic Optimization: A Second Example

Page 94: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-94

CSE4701

Books

Loans

Borrower

Borrower.Card_No = Loans.Card_No

X

X

Title

Date 1/1/88

CombineProjections

Books.LC_No = Loans.LC_No

What is Still a Problem?We are Not Projecting so All Attributes are Still Collected Until the Final Project!

Heuristic Optimization: A Second Example

Page 95: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-95

CSE4701

Books

Loans

Borrower

Borrower.Card_No = Loans.Card_No

X

X

Title

Date 1/1/88

Add Strategic Projections to Send Only the Minimum

Up the Tree as Needed for Join/Result Set

Books.LC_No = Loans.LC_No

Heuristic Optimization: A Second Example

Loans.LC_No,Loans.Card_No

Loans.LC_No

Borr.Card_No

Books.LC_No, Title

Page 96: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-96

CSE4701

Books

Loans

Borrower

Borrower.Card_No = Loans.Card_No

X

X

Title

Date 1/1/88

Books.LC_No = Loans.LC_No

Heuristic Optimization: A Second Example

Loans.LC_No,Loans.Card_No

Loans.LC_No

Borr.Card_No

Books.LC_No, Title

What is the Final Step? Combine Select and Cartesian Product

Result: Equijoins!

Page 97: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps17&18-97

CSE4701

Heuristics Query Optimization: Summary First Apply Operations that Reduce the Size of

Intermediate Results Move Selections and Projections Down the Tree as

far as Possible Early Selections Reduce the Number of Tuples Early Projections Reduce the Number of Attributes

Selection and Join Should be Executed Before Other Similar Operations. This is Accomplished by Reordering the Leaf Nodes of

the Tree Among Themselves and Adjusting the Rest of the Tree Appropriately

Page 98: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

CSE4701

Chapter 14-98

Slides on Concurrency Control Algorithms

Page 99: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-99

CSE 4701

What is a Schedule? Transaction schedule or history:

When transactions are executing concurrently in an interleaved fashion, the order of execution of operations from the various transactions forms what is known as a transaction schedule

A schedule S of n transactions T1, T2, …, Tn is: Ordering of operations of transactions where, for

each transaction Ti that participates in S, the operations of T1 in S must appear in the same order in which they occur in T1.

Operations from other transactions Tj can be interleaved with the operations of Ti in S.

Page 100: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-100

CSE 4701

What is a Schedule? A Schedule S is a Sequence of R/W Operations,

Which End with Commit or Abort Different Transactions Executing Concurrently in

an Interleaved Fashion with One Another Each Transaction a Sequence of R/W Operations

Two Schedules S1 and S2 are Equivalent, Denoted as S1 S2 , If and Only If S1 and S2 Execute the Same Set of Transactions Produce the Same Results (i.e., Both Take the DB

to the Same Final State)

Page 101: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-101

CSE 4701

Transactions and a Schedule Below are Transactions T1 and T2 Note that the Their Interleaved Execution Shown

Below is an Example of One Possible Schedule There are Many Different Interleaves of T1 and T2

T1 T2

Read(X);X:=X;Write(X);

Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1;

Page 102: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-102

CSE 4701

Transactions and a Schedule What Happens if the Schedule Changes to:

T1 T2

Read(X);X:=X;Write(X);

Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

T1 T2

Read(X);X:=X;

Write(X);

Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);

X:=X;Write(X);commit;

Page 103: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-103

CSE 4701

Equivalent Schedules Are the Two Schedules below Equivalent? S1 and S4 are Equivalent, since They have the Same Set

of Transactions and Produce the Same ResultsT1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S1

T1 T2

Read(X);X:=X;Write(X);

Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S4

S4: R1(X), W1(X), R2(X), W2(X), c2, R1(Y), W1(Y), c1;

S1: R1(X),W1(X), R1(Y), W1(Y), c1, R2(X), W2(X), c2;

Page 104: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-104

CSE 4701

What are Different Types of Schedules? Recoverable schedule:

One where no transaction needs to be rolled back. No transaction T in S commits until all transactions

T’ that write an item that T reads have committed. Cascadeless schedule:

One where every transaction reads only the items that are written by committed transactions.

Cascaded rollback: A schedule in which uncommitted transactions that

read an item from a failed transaction must be rolled back – Read value written by Failed Trans

Strict Schedules: A schedule in which a transaction can neither read

or write an item X until the last transaction that wrote X has committed.

Page 105: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-105

CSE 4701

Serial and Serializable Schedules Serial schedule:

A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule. Otherwise, the schedule is called nonserial schedule.

Serializable schedule: A schedule S is serializable if it is equivalent to

some serial schedule of the same n transactions. Being serializable implies that the schedule is a correct

schedule that: Leaves the database in a consistent state. The interleaving of operations results in a state as

if the transactions were serially executed, while achieving efficiency due to concurrent execution.

Page 106: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-106

CSE 4701

Serializability of Schedules A Serial Execution of Transactions Runs One

Transaction at a Time (e.g., T1 and T2 or T2 and T1) All R/W Operations in Each Transaction Occur

Consecutively in S, No Interleaving Consistency: a Serial Schedule takes a Consistent

Initial DB State to a Consistent Final State A Schedule S is Called Serializable If there Exists an

Equivalent Serial Schedule A Serializable Schedule also takes a Consistent

Initial DB State to Another Consistent DB State An Interleaved Execution of a Set of Transactions

is Considered Correct if it Produces the Same Final Result as Some Serial Execution of the Same Set of Transactions

We Call such an Execution to be Serializable

Page 107: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-107

CSE 4701

Example of Serializability Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20

After S1 or S2 X = 7 and Y = 40 These are the two Possible Serial Schedules

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S1 Schedule S2

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Page 108: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-108

CSE 4701

Example of Serializability Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20

After S1 or S2 X = 7 and Y = 40 Is S3 a Serializable Schedule?

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S1 Schedule S2

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

T1 T2

Read(X);X:=X;

Write(X);Read(Y);

Y = Y + 20;Write(Y);commit;

Read(X);X:=X;

Write(X);commit;

Schedule S3

Page 109: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-109

CSE 4701

Example of Serializability Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20

After S1 or S2 X = 7 and Y = 40 Is S4 a Serializable Schedule?

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S1 Schedule S2

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

T1 T2

Schedule S4

Read(X);X:=X;Write(X);

Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Page 110: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-110

CSE 4701

Two Serial Schedules with Different Results Consider S1 and S2 for Transactions T1 and T2 If X = 10 and Y = 20

After S1 X = 7 and Y = 28 After S2 X = 7 and Y = 27

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = X + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S1 Schedule S2

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = X + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

A Schedule is Serializableif it Matches Either S1 or S2 ,Even if S1 and S2 Produce Different Results!

Page 111: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-111

CSE 4701

Thoughts on Serializability Serializability is hard to check

Interleaving of operations occurs in an operating system through some scheduler

Difficult to determine beforehand how the operations in a schedule will be interleaved

Need to Adopt a Practical Approach Come up with methods (protocols) to ensure

serializability. However, it is not possible to determine when a

schedule begins and when it ends. Hence, we reduce the problem of checking the

whole schedule to checking only a committed project of the schedule

Page 112: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-112

CSE 4701

How do we Check for Conflicts? Testing for conflict serializability:

Look at only read_Item (X) and write_Item (X) operations

Constructs a precedence graph (serialization graph) with directed edges

An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj

The schedule is serializable if and only if the precedence graph has no cycles.

Page 113: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-113

CSE 4701

The Serializability Theorem A Dependency Exists Between Two Transactions If:

They Access the Same Data Item Consecutively in the Schedule and One of the Accesses is a Write

Three Cases: T2 Depends on T1 , Denoted by T1 T2

T2 Executes a Read(x) after a Write(x) by T1

T2 Executes a Write(x) after a Read(x) by T1

T2 Executes a Write(x) after a Write(x) by T1 Don’t carE about Read(x) Read(x)

Transaction T1 Precedes Transaction T2 If:

There is a Dependency Between T1 and T2, and

The R/W Operation in T1 Precedes the Dependent T2 Operation in the Schedule

Page 114: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-114

CSE 4701

The Serializability Theorem A Precedence Graph of a Schedule is a Graph

G = <TN, DE>, where Each Node is a Single Transaction;

i.e.,TN = {T1, ..., Tn} (n>1)

and Each Arc (Edge) Represents a Dependency Going

from the Preceding Transaction to the Other i.e., DE = {eij | eij = (Ti, Tj), Ti, Tj TN}

Use Dependency Cases on Prior Slide The Serializability Theorem

A Schedule is Serializable if and only of its Precedence Graph is Acyclic

Page 115: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-115

CSE 4701

Serializability Theorem Example Consider S1 and S2 for Transactions T1 and T2

Consider the Two Precedence Graphs for S1 and S2 No Cycles in Either Graph!

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

Schedule S1 Schedule S2

T1 T2

Read(X);X:=X;Write(X);Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

T1 T2

X

Schedule S1

T1 T2

X

Schedule S2

Page 116: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-116

CSE 4701

What are Precedence Graphs for S3 and S4? For S3

T1 T2 (T2 Write(X) After T1 Write(X)) T2 T1 (T1 Write(X) After T2 Read (X))

For S4 T1 T2 (T2 Read/Write(X) After T1 Write(X))

T1 T2

X

Schedule S4

T1 T2

Read(X);X:=X;

Write(X);Read(Y);

Y = Y + 20;Write(Y);commit;

Read(X);X:=X;

Write(X);commit;

Schedule S3

T1 T2

Schedule S4

Read(X);X:=X;Write(X);

Read(Y);Y = Y + 20;Write(Y);commit;

Read(X);X:=X;Write(X);commit;

T1 T2

X

Schedule S3

X

Page 117: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-117

CSE 4701

Four Schedules and their …

Page 118: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-118

CSE 4701

… Precedence Graphs

Page 119: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-119

CSE 4701

Serializability Facts Serializability Emphasizes Throughput Serializable Executions Allow us to Enjoy the Benefits

of Concurrency without Giving up Any Correctness However, we May NOT GET the Same Result

Testing for Serializability Difficult in Practice: Finding a Serializable Schedule for an Arbitrary

Set of Transactions is NP-hard Interleaving of Operations From Concurrent Transs

is Determined Dynamically at Run-time Practically Almost Impossible to Determine

Ordering of Operations Beforehand to Ensure Serializability

Page 120: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-120

CSE 4701

Database Concurrency Control Purpose of Concurrency Control

To enforce Isolation (through mutual exclusion) among conflicting transactions.

To preserve database consistency through consistency preserving execution of transactions.

To resolve read-write and write-write conflicts. Example:

In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.

Page 121: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-121

CSE 4701

Concurrency Control Different Locking-Based Algorithms

Binary Locks (Lock and Unlock) Share Read Locks and Exclusive Write Locks Write Lock Does Not Imply Read

2 Phase Protocol All Locks Must Precede All Unlocks in Trans. True for All Transactions - Schedule Serializable

Concurrency Control Implementation Techniques Optimistic Concurrency Control

Time-Based Access to Information Consider “When” Information Read/Written to

Identify Potential or Prior Conflicts We’ll Deviate from Textbook Notation

Page 122: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-122

CSE 4701

Summary of CC Techniques Two-Phase Locking

Most Important in Practice Used by a Majority of DBMSs Serializes in the Middle of Transactions Low Overhead Relatively Low Concurrency

Timestamp-Based Based on Multiple Versions of Data Items Serializes at the Beginning of Transactions Mostly Used in Distributed DBMSs

Optimistic Concurrency Control Methods Serializes at the End of Transactions Relatively High Concurrency

Page 123: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-123

CSE 4701

Recalling Important Concepts Transaction: Sequence of Database Commands that

Must be Executed as a Single Unit (Program) Recall SQL Update Query

Equivalent to Multiple Operations Read from DB, Modify (Local Copy), Write to DB Modify Sometimes Delete and Insert

Granularity: Size of Data that is Locked for an Executing DB Transaction - Wide Range Database Relation (Tuple vs. Entire Table) Attribute (Column) Meta-Data (System Catalog)

Locking: Provides Means for Synchronization

Page 124: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-124

CSE 4701

Transaction Example Two Possible Outcomes for T1 and T2 – Let A = 5

If T1 First, then A = 150 If T2 First, then A = 60

Is this a Problem?

T1 T2

LOCK AREAD AA=A+10WRITE AUNLOCK Acommit;

LOCK AREAD AA=A*10WRITE AUNLOCK Acommit;

T1 T2

LOCK AREAD AA=A+10WRITE AUNLOCK Acommit;

LOCK AREAD AA=A*10WRITE AUNLOCK Acommit;

Page 125: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-125

CSE 4701

Transaction Example The Two Different Orderings of

T1 and T2 Represent Alternate Serial Schedules (Non-Interleaved)

Key Concept: Concurrent (Interleaved) Execution of Several DB Transactions is Correct if and only if its Effect is the Same as that Obtained by Running the Same Transactions in a Serial Order

If Result is Either 150 or 60 – it is OK! This is the Concept of Serializability!

T1 T2

LOCK AREAD AA=A+10WRITE AUNLOCK Acommit;

LOCK AREAD AA=A*10WRITE AUNLOCK Acommit;

Page 126: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-126

CSE 4701

Recalling Key Definitions A Schedule for a Set of Transactions is the Order in

When the Elementary Steps (Read, Lock, Assign, Commit, etc.) are Performed

A Schedule is Serial if All Steps of Each Transaction Occur Consecutively

A Schedule is Serializable if it is Equivalent to “Some” Serial Schedule

If T1, T2 and T3 are Transactions - What are the Possible Serial Schedules? T1 T2 T3 T1 T3 T2 T2 T1 T3

Different Serial Schedules for 4 Transactions?

T2 T3 T1 T3 T1 T2 T3 T2 T1

Page 127: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-127

CSE 4701

Another Example of Serializability Two Serial Schedules – Let A = 15, B = 25, C=5 What are Values of A, B, and C after Each?

T1 T2

Read(A);A:=A10;Write(A);Read(B);B = B + 10;Write(B);commit;

Read(B);B:=B20;Write(B);Read(C);C=C+20Write(C)commit;

T1 T2

Read(A);A:=A10;Write(A);Read(B);B = B + 10;Write(B);commit;

Read(B);B:=B20;Write(B);Read(C);C=C+20Write(C)commit;

S1 S2

A = 5, B = 15, C=25

Page 128: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-128

CSE 4701

Another Example of Serializability Is S3 or S4 – Let A = 15, B = 25, C = 5 Serial Values:

T1 T2

Read(A);

A:=A10;

Write(A);

Read(B);

B = B + 10;

Write(B);

commit;

Read(B);

B:=B20;

Write(B);

Read(C);

C=C+20

Write(C)

commit;

T1 T2

Read(A);A:=A10;

Write(A);

Read(B);

B = B + 10;

Write(B);commit;

Read(B);

B:=B20;

Write(B);

Read(C);

C=C+20Write(C)commit;

A = 5, B = 15, C=25

A = 5B = 35C = 25

A = 5B = 15C = 25

Page 129: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-129

CSE 4701

Locks Lock: Variable Associated with a Data Item in DB,

Describing the Status of that Item w.r.t. Possible Ops. A Means of Synchronizing the Access by

Concurrent Transactions to the Database Item Managed by Lock Manager

Binary Locks: Lock(x) and Unlock(x) A Transaction T Must Issue the Lock(x) before any

Read(x) or Write(x) A Transaction T Must use the Unlock(x) After all

Read(x)/Write(x) Operations are Completed in T System Catalog Maintains a Lock Table for All

Locked Items Lock(x)(or Unlock(x)) will not be Granted if there

Already Exists a Lock(x) (or Unlock(x))

Page 130: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-130

CSE 4701

Database Transaction is a Sequence of Lock/Unlocks Item Locked must Eventually be Unlocked A Transaction Holds a Lock between Lock and Unlock

Statements Lock/Unlock Assumes that the Value of the Item

Changes (Always Assumes a Write)

For a Number of Transactions that Lock/Unlock A, we’d have: f1(f2(f3( … fn( a0))))

A Basic Lock/Unlock Model

a0 f(a0) a0 Lock AUnlock Af(a0)

Page 131: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-131

CSE 4701

Example - Assessing Schedule Consider Three Transactions Below:

T1 has f1(a) and f2(b) T2 has f3(b) and f4(c) and f5(a) T3 has f6(a) and f7 (c)

Functions Represent actions that Modify Instances a, b, and c of Data Items A, B, and C, Respectively

T1 Lock ALock BUnlock AUnlock B

T2 Lock BLock CUnlock BLock AUnlock CUnlock A

T3 Lock ALock CUnlock CUnlock A

Page 132: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-132

CSE 4701

Example - Assessing Schedule Consider the Schedule with Changes to a, b, and c

Is this Schedule Serializable?

A B C

T1 Lock A a b cT2 Lock B a b cT2 Lock C a b cT2 Unlock B a f3(b) c T1 Lock B a f3(b) c T1 Unlock A f1(a) f3(b) c T2 Lock A f1(a) f3(b) c T2 Unlock C f1(a) f3(b) f4( c ) T2 Unlock A f5 (f1(a)) f3(b) f4( c ) T3 Lock A f5 (f1(a)) f3(b) f4( c ) T3 Lock C f5 (f1(a)) f3(b) f4( c ) T1 Unlock B f5 (f1(a)) f2 (f3(b)) f4( c ) T3 Unlock C f5 (f1(a)) f2 (f3(b)) f7 (f4( c )) T3 Unlock A f6(f5 (f1(a))) f2 (f3(b)) f7 (f4( c ))

Page 133: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-133

CSE 4701

Is this Schedule Serializable? Focus on the Final Line - It indicates the Effective

Order of Execution of Each Transaction for a, b, and c T1 has f1(a) and f2(b) T2 has f3(b) and f4(c) and f5(a) T3 has f6(a) and f7 (c)

For A - Order of Transactions is T1 T2 T3

For B - T2 Must Precede T1 For C - T2 Must Precede T3 Can All Three Conditions be True w.r.t. Order?

A B C

T3 Unlock A f6(f5 (f1(a))) f2 (f3(b)) f7 (f4( c ))

Page 134: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-134

CSE 4701

Determining Serializability in this Model Examine Schedule Based on Order in Which Various

Transactions Obtain Locks Order must be Equivalent to Some Hypothetical Serial

Schedule of Transactions If Orders for Different Data Items Forces Two

Transactions to Appear in a Different Order(T2 Must Precede T1 and T1 Must Precede T2 )There is a Paradox!

This is Equivalent to Searching for Cycles in a Directed Graph

Page 135: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-135

CSE 4701

Recall Topological Sort Graph is Acyclic Find a Node of Graph with ONLY Arrows Leaving (no

Entering) Delete Node and Arrows

Page 136: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-136

CSE 4701

Algorithm 1: Binary Lock Model Input: Schedule S for Transactions T1, T2 , … Tk Output: Determination if S is Serializable, and If so,

an Equivalent Serial Schedule Method: Create a Directed Precedence Graph G:

Let S = a1 ; a2 ; … ; an where each ai is Tj :Lock Am or Tj : Unlock Am

For each ai = Tj : Unlock Am , find next ap = Ts : Lock Am (1 < p n) (Ts is next Trans. to lock Am), and if so, draw Arc in G from Tj to Ts

Repeat Until All Unlock/Lock are Checked Review the Resulting Precedence Graph

If G has Cycles - Non-Serializable If G is Acyclic - Topological Sort to Find an Equivalent

Serial Schedule

Page 137: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-137

CSE 4701 T1 Lock A

T2 Lock BT2 Lock CT2 Unlock BT1 Lock B T1 Unlock AT2 Lock AT2 Unlock C T2 Unlock A T3 Lock A T3 Lock C T1 Unlock B T3 Unlock CT3 Unlock A

Precedence Graph for Prior Example Look for Unlock Lock Combos on the

Same Data Item T2 Unlock B and T1 Lock B T1 Unlock A and T2 Lock A T2 Unlock C and T3 Lock C T2 Unlock A and T3 Lock A

IS IT SERIALIZABLE?

T1 T2

B

A

T3

A, C

Page 138: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-138

CSE 4701 T2 Lock A

T2 Unlock AT3 Lock A T3 Unlock AT1 Lock BT1 Unlock B T2 Lock B T2 Unlock B

Another Example Look for Unlock Lock Combos on the

Same Data Item T2 Unlock A and T3 Lock A T1 Unlock B and T2 Lock B

IS IT SERIALIZABE? IF SO WHAT IS THE SCHEDULE?

T1 T2

B

T3

A

Page 139: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-139

CSE 4701

Two-Phase Protocol Two-Phase Protocol - All Locks Must Precede All

Unlocks in the Schedule for a Transaction Which of the Transactions Below are Two-Phase? Why or Why Not?

T1 Lock ALock BUnlock AUnlock B

T2 Lock BLock CUnlock BLock AUnlock CUnlock A

T3 Lock ALock CUnlock CUnlock A

Page 140: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-140

CSE 4701

Theorems Regarding Serializability Theorem 1: Algorithm 1 Correctly Determines if a

Schedule S is Serializable (omit the proof). Theorem 2: If S is any Schedule of 2 Phase

Transactions (i.e., all of its Transactions are 2-Phase), then S is Serializable. Proof by Contradiction. Suppose Not - they by Theorem 1, S has a

Precedence Graph G with a Cycle T1 T2 T3 … Tp T1

UNL L UNL UNL L In T1 T2 , T1 is Unlock, so all Remaining Actions

must also be Unlock, since S is 2 Phase However, in Tp T1 , T1 is Lock, which is a

Contradiction to Fact that S is 2 Phase

Page 141: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-141

CSE 4701

Problems of Binary Locks Only One Transaction Can Hold a Lock on a Given

Item No Shared Reading is Allowed - Too Restrictive For Example

T1 is Read Only on X - Yet Needs Full Lock T2 is Read Only on X and Y - Needs Full Locks

T1 T2

Read(X);

Read(Y) commit;

time

Read(X); Read(Y);

Y = Y + 20;Write(Y);

commit;

t1

t2

t3

t4

t5

Page 142: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-142

CSE 4701

A Read/Write Lock Model Refines the Granularity of Locking to Differentiate

Between Read and Write Locks Improves Concurrent Access Rlock (Shared): If T has an Rlock A, then Any Other

Transaction can Also Rlock A, but All Transactions are Forbidden from Wlock A until All Transactions with Rlock A issue Ulock A (Multiple Reads)

Wlock (Exclusive): If T has Wlock A, then All Other Transactions are Forbidden to Rlock or Wlock A Until T Ulocks A (Write Implies Reading, Single Write)

Two Schedules are Equivalent if: Produce Same Value for Each Data Item Each Rlock on an Item Occurs in Both Schedules

at a Time When Locked Item has the Same Value

Page 143: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-143

CSE 4701

Algorithm 2: Read/Write Lock Model Input: Schedule S for Transactions T1, T2 , … Tk Output: Is S Serializable? If so, Serial Schedule Method: Create a Directed Precedence Graph G:

Suppose in S, Ti :Rlock A. If Tj : Wlock A is the Next Transaction to Wlock A (if it

exists) then place an Arc from Ti to Tj.

Repeat for all Ti’s, all Rlocks before Wlock on A! Suppose in S, Ti :Wlock A.

If Tj : Wlock A is the Next Transaction to Wlock A (if it exists) then place an Arc from Ti to Tj.

If Also exists Tm :Rlock A after Ti :Wlock A but before Tj : Wlock A, then Draw an Arc from Ti to Tm.

Review the Resulting Precedence Graph If G has Cycles - Non-Serializable If G is Acyclic - Topological Sort for Serial Schedule

Page 144: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-144

CSE 4701

Consider the Following Schedule What are the Dependencies Among Transactions?

T1 T2 T3 T4 (1) Wlock A(2) Rlock B(3) Unlock A(4) Rlock A(5) Unlock B(6) Wlock B(7) Rlock A(8) Unlock B(9) Wlock B(10) Unlock A(11) Unlock A(12) Wlock A(13) Unlock B(14) Rlock B(15) Unlock A(16) Unlock B

Page 145: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-145

CSE 4701

Consider the Following Schedule What is the Precedence Graph G?

T1 T2 T3 T4 (1) Wlock A(2) Rlock B(3) Unlock A(4) Rlock A(5) Unlock B(6) Wlock B(7) Rlock A(8) Unlock B(9) Wlock B(10) Unlock A(11) Unlock A(12) Wlock A(13) Unlock B(14) Rlock B(15) Unlock A(16) Unlock B

Page 146: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-146

CSE 4701

Precedence Graph What is the Resulting Precedence Graph? Is the Schedule Serializable? Why or Why Not?

T1 T2

T3T4

A:RW

A:RW

B:RW

A:WW

B:WWA:WR

Page 147: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-147

CSE 4701

A Read-Only/Write-Only Lock Model Revision of the Read/Write Model for Algorithm 2 Refining Our Assumptions

Assume that a Wlock on an Item Does not Mean that the Transaction First Reads the ItemContrary to First Two Models

Example:Read A; Read B; C=A+B; A=A-1; Write A; Write CReads A, B and Writes A,C (No Read on C)

Reformulate Notion of Equivalent Schedules

Page 148: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-148

CSE 4701

How Does This Model Differ from Alg. 2? Consider the Schedule Segment:

T1 : Wlock A T1 : Ulock A T2 : Wlock A T2 : Ulock A

In Algorithm 2 - T2 : Wlock A Assumes that T2 Reads the Value Written by T1

However, This Need Not be True in the New Model If Between T1 and T2, No Transaction Rlocks A, then

Value Written by is T1 Lost, and T1 Does not Have to Precede T2 in a Schedule w.r.t. A

Page 149: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-149

CSE 4701

Redefine Serializability Conditions on Serializability Must be Redefined in

Support of the Write-Does-Not-Assume Read Model If in Schedule S, T2 Reads “A” Written by T1, then

T1 Must Precede T2 in any Serial Schedule Equivalent to S

Further, if there is a T3 that Writes “A”, then in any Serial Schedule Equivalent to S, T3 may either Precede T1 or Follow T2, but may not Appear Between T1 and T2

Graphically, we have:T3

A:WRT1

T2T3

T1 T2 T3 T1 T3 T2 T2 T1 T3

T2 T3 T1 T3 T1 T2 T3 T2 T1

Page 150: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-150

CSE 4701

Augmentation of Precedence Graph In Support of the Write Does Not Imply Read Model,

we must Augment the Precedence Graph: Add an Initial Transaction To that Writes Every

Item, and a Final Transaction Tf that Reads Every Item

When a Transaction T’s Output is Invisible in Tf (I.e., the Value is Lost), Then T is Referred to as a Useless Transaction

Useless Transactions have no Paths from Transaction to Tf

Note: Maintain Same set of Locks (Rlock, Wlock, Ulock) with Different Interpretation on Wlock

Page 151: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-151

CSE 4701

Intuitive View of Algorithm 3 If T2 Reads Value of “A” Written by T1 , then T2 Must

Precede in any Serial Schedule For WR Combo - Draw an Arc from T1 to T2

Now Consider a T3 that also Writes “A” T3 Must be either Before T1 or After T2 Add in a Pair of Arcs T3 to T1 and T2 to T3 of

Which one Must be Chosen in the Final Precedence Graph

Serializability Occurs if After Choices Made for each “T3” Pair, the Resulting Graph is Acyclic

G is Referred to as a “Polygraph” with Nodes, Arcs, and Alternate Arcs

Page 152: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-152

CSE 4701

Algorithm 3 Example T1 T2 T3 T4

(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock A

Page 153: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-153

CSE 4701

Algorithm 3 – Steps 1 to 4 Input: Schedule S for Transactions T1, T2 , … Tk Output: Is S Serializable? If so, Serial Schedule Method: Create a Directed Polygraph Graph P:

1. Augment S with Dummy To (Write Every Item) an Dummy Tf (Read Every Item)

2. Create Initial Polygraph P by Adding Nodes for To, Tf, and Each Ti Transaction , in S

3. Place an Arc from Ti to Tj Whenever Tj Reads A in Augmented S (with Dummy States) that was Last Written by Ti. Repeat this Step for all Arcs.Don’t Forget to Consider Dummy States!

4. Discover Useless Transactions - T is Useless if there is no Path from T to Tf

This is the “Initialization” Phase of Algorithm 3

Page 154: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-154

CSE 4701

Resulting Polygraph - Steps 1 to 2

T4T3T2T1T0 Tf

1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P

Page 155: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-155

CSE 4701

Alg 3 Step 3 - Init=T0 & Fin=Tf T1 T2 T3 T4

T0 Write A Write B Write C Write D(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock ATf Read A Read B Read C Read D

Who Reads A after T0 Writes A?

Who Reads A after T4 Writes A?

Who Reads B after T1 Writes B?

Who Reads B after T4 Writes B?

Who Reads C after T1 Writes C?

Who Reads D after T2 Writes D?

Page 156: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-156

CSE 4701

Step 3 -Write to Reads on A

Page 157: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-157

CSE 4701

Step 3 - Write to Reads on B

Page 158: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-158

CSE 4701

Step 3 - Write to Reads on C

Page 159: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-159

CSE 4701

Step 3 - Write to Reads on D

Page 160: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-160

CSE 4701

Resulting Polygraph - Steps 1 to 3

T4T3T2T1T0 Tf

1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P 3. Look for Ti Write X to Tj Read X for all Items X 4. Look for Useless Transactions - No Paths from T to Tf

A:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WR

C:WRD:WR

Page 161: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-161

CSE 4701

Resulting Polygraph - Steps 1-4 1. Add To and Tf to S, 2. Add To , Tf , T1 , T2 , T3 , T4 to Polygraph P 3. Look for Ti Write X to Tj Read X for all Items X 4. For - T3 Remove Arcs Into T3 – This Completes Step 4

T4T3T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

Page 162: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-162

CSE 4701

Algorithm 3 – Steps 5 to 7 Method: Reassess the Initial Polygraph P:

5. For Each Remaining Arc Ti W to Tj R(meaning that Tj Reads Item A Written by Ti )Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No Arcs

II. If Ti = To and Tj Tf then Add Arc from Tj to T

III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T

6. Determine if P is Acyclic by “Choosing” One Transaction Arc for Each Pair - Make Choices Carefully

7. If Acyclic - Serializable - Perform Topological Sort without To , Tf for Equivalent Serial Schedule. Else - Not Serializable

Page 163: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-163

CSE 4701

What are Four Cases of Step 5 Conceptually? 5. For Each Remaining Arc Ti W to Tj R

Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No ArcsII. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T

Ti TjX:WR

T0 TfX:WR

General Case:

Case I: no new arc

T0 TjX:WR

Case II: Add Arc to from Ti to TT is after

TII X:RW

Page 164: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-164

CSE 4701

What are Four Cases of Step 5 Conceptually? 5. For Each Remaining Arc Ti W to Tj R

Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No ArcsII. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T

Ti TjX:WR

Ti TfX:WR

General Case:

Case III: Add Arc from T to Ti – T is before

TIII X:RW

Page 165: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-165

CSE 4701

What are Four Cases of Step 5 Conceptually? 5. For Each Remaining Arc Ti W to Tj R

Consider all T To and T Tf that also Writes A:I. If Ti = To and Tj = Tf then Add No ArcsII. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Arc Pair from T to Ti and Tj to T

Ti TjX:WR

Ti TjX:WR

General Case:

Case IV: Add in two Arcs T is after Tj or before Ti

TIV X:RW

IV X:RW

Page 166: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-166

CSE 4701

T1 T2 T3 T4 To Write A Write B Write C Write D(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock ATf Read A Read B Read C Read D

Alg 3 Ex - Step 5 - Who Else Writes A?

For T0 to T1 Arc Who Else Writes A?For T0 to T2 Arc

Who Else Writes A?

For T4 to Tf Arc Who Else Writes A?

Page 167: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-167

CSE 4701

Resulting Polygraph - Step 5 - A:WR

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

T4T3T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

II A:RW

Page 168: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-168

CSE 4701

Resulting Polygraph - Step 5 - A:WR 5. For Each Arc Ti to Tj Consider All T’s that Write X

I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T Check Items A (see new arcs/labels - case II and

III)

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

II A:RW

Page 169: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-169

CSE 4701

Alg 3 Ex - Step 5 - Who Else Writes C/D? T1 T2 T3 T4

Init Write A Write B Write C Write D To(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock AFin Read A Read B Read C Read D Tf

For three T1 Arcs Does Anyone Else Write C?

For One T2 Arc Does Anyone Else Write D?

Page 170: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-170

CSE 4701

Resulting Polygraph-Step 5- C:WR & D:WR 5. For Each Arc Ti to Tj Consider All T’s that Write X

I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T Do any Other Transactions Write C or Write D

for the arrows labeled C:WR and D:WR Respectively?

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

II A:RW

Page 171: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-171

CSE 4701

Alg 3 Ex - Step 5 - Who Else Writes B? T1 T2 T3 T4

Init Write A Write B Write C Write D(1) Rlock A(2) Rlock A(3) Wlock C(4) Unlock C(5) Rlock C(6) Wlock B(7) Unlock B(8) Rlock B(9) Unlock A(10) Unlock A(11) Wlock A(12) Rlock C(13) Wlock D(14) Unlock B(15) Unlock C(16) Rlock B(17) Unlock A(18) Wlock A(19) Unlock B(20) Wlock B(21) Unlock B(22) Unlock D(23) Unlock C(24) Unlock AFin Read A Read B Read C Read D

For T4 to Tf Arc Who Else Writes B?T1 but already Arc from T1 to T4

For T1 to T4 Arc Who Else Writes B?Just T4 so no arc For T1 to T2 Arc Who Else Writes B?This is Case IV

T4 Writes B Two Arcs:

T4 after T2 and T4 before T1

Page 172: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-172

CSE 4701

Two Added Arcs for Case IV and B

IV B:RW

IV B:RW

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

II A:RW

T4 Follows T2 and T4 Before T1

Page 173: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-173

CSE 4701

Resulting Polygraph - Step 5 and 6 5. For Each Arc Ti to Tj Consider All T’s that Write X

I. If Ti = To and Tj = Tf then Add No Arcs II. If Ti = To and Tj Tf then Add Arc from Tj to T III. If Ti To and Tj = Tf then Add Arc from T to Ti

IV. If Ti To and Tj Tf then Add Pair from T to Ti and Tj to T B (see new arcs - including alternates - dashed)

For T1 to T2, T4 writes - so add T2 to T4 and T4 to T1 – Case IV

Either T4 After T2 or Before T1 - no new arcs for other WRs.

IV B:RW

IV B:RW

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

II A:RW

Page 174: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-174

CSE 4701

Resulting Polygraph - Step 5 and 6 6. Which Option of Pair of Arcs Should be Chosen? Why?

IV B:RW

IV B:RW

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

II A:RW

Page 175: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-175

CSE 4701

Final Polygraph - Step 7 Final Graph with Are Removed Delete Dummy States below

Topological Sort Yields Order: T1 , T2 , T3 , T4

IV B:RW

T4T2T1T0 TfA:WR

A:WR

A:WR

B:WR

B:WR B:WR

C:WR

C:WRD:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

IV B:RW

T4T2T1

B:WR

B:WR

C:WR

T3

II A:RW

II A:RW

III A:RW

II A:RW

II A:RW

II A:RW

Page 176: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-176

CSE 4701

Why Optimistic Concurrency Control? Motivate by Disadvantages of Locking Techniques

Lock Maintenance Deadlock-Free Locking Protocols Limit

Concurrency Secondary Memory Access Causes Locks to be

Held for a Long Duration Locks Typically Held Until Transaction

Completes, Which Reduces Concurrency Often Needed in “Worst” Case Only Overhead - Locking + Deadlock Detection

Key Concept Write Collisions in Large Databases for “Many”

Applications are Rare OCC: “Don’t Worry be Happy” Approach

Page 177: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-177

CSE 4701

Basic Ideas of OCC Interference Between Transactions is Rare and

Locking Incurs too Much Overhead Instead, Allow Each Transaction to Execute Freely,

and Check Serializability at the end of the Transaction Win (Allow to Commit) If No Interference Occurs or

There have been No Conflicts

Pessimistic execution

Optimistic execution

Validate Read(and Compute)

Write

ValidateRead Write(and Compute)

Page 178: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-178

CSE 4701

How Does OCC Work? Execute Transactions Ad-Hoc - Let them Go

Uncontrolled Maintain Information of “Relevant” Actions Against

DB (Often in Conjunction with Recovery/Journal) When Transactions Finish - Check to see if Everything

Proceeded Satisfactorily Assumes that Probability of Transaction Interference

is Quite Small Two Questions re. OCC:

How Do We know Everything Went OK? How do we Recover if it Didn’t?

Page 179: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-179

CSE 4701

What is a Timestamp? Timestamp

A monotonically increasing variable (integer) indicating the age of an operation or a transaction.

A larger timestamp value indicates a more recent event or operation.

Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions.

Page 180: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-180

CSE 4701

OCC Utilizes Timestamps Timestamps are Clock Ticks used to Record the Major

Milestones in the Execution of a Transaction Examples Include:

Start Time of Transaction Read/Write Times for DB Items Finish Time of Transaction Commit Time of Transaction

Two Important Definitions are: Read Time of an Item: Highest Time Stamp

Possessed by Any Transaction that Reads the Item Write Time of an Item: Highest Time Stamp

Possessed by Any Transaction that Wrote the Item A Transaction has a Fixed Time when it Started that is

Constant Throughout its Execution

Page 181: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-181

CSE 4701

How are Timestamps Used? Focus on “When” Reads and Writes Occur Transaction Cannot Read an Item if its Value was Not

Written Until After the Transaction Finished its Execution Transaction T with Timestamp t1 Cannot Read an

Item with a Write Time of t2 if t2 > t1 If this is the Case, T Must Abort and be Restarted Can’t Read Item if it hasn’t been Written

Transaction Cannot Write an Item if that Item has its Old Value Read at a Later Time Transaction T with Timestamp t1 Cannot Write an

Item with a Read Time of t2 if t2 > t1 If this is the Case, T Must Abort and be Restarted Can’t Write Item Being Read at a Later Time

Page 182: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-182

CSE 4701

Algorithm 4: Optimistic CC Let T be a Transaction with Timestamp t Attempting to

Perform Operation X on a Data Item I with Readtime tR and Writetime tW If (X = Read and t tW ) or

(X = Write and t tR ) then Perform Operation If t > tR then set tR = t for Data Item I (read after write)

If t > tW then set tW = t for Data Item I (write after read) If (X = Write and tR t < tW ) then Do Nothing since

Later Write will Cancel out the Write of T If (X = Read and t < tW ) or

(X = Write and t < tR ) then Abort the Operation 1st - T trying to Read Item Before it was Written 2nd - T trying to Write an Item Before it was Read

Page 183: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-183

CSE 4701

T1 T2 T3 A B C

200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0

(1) Read B

(2) Read A

(3) Read C

(4) Write B

(5) Write A

Example of OCC

What Happens at Each Step w.r.t. RT/WT?

RT=0 RT=200 RT=0WT=0 WT=0 WT=0

RT=150 RT=200 RT=0WT=0 WT=0 WT=0

RT=150 RT=200 RT=175WT=0 WT=0 WT=0

RT=150 RT=200 RT=175WT=0 WT=200 WT=0

RT=150 RT=200 RT=175WT=200 WT=200 WT=0

Page 184: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-184

CSE 4701

T1 T2 T3 A B C

200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0

(1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0

(2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0

(3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0

(4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0

(5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0

(6) Write C

Example of OCC

What Happens at Step 6? WT(C) =150 < RT(C)=175 Trying to write C after its Read - Consequence - Abort T2

RT=150 RT=200 RT=175WT=200 WT=200 WT=0

Page 185: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-185

CSE 4701

T1 T2 T3 A B C

200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0

(1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0

(2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0

(3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0

(4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0

(5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0

(6) RT=150 RT=200 RT=175 Write C WT=200 WT=200 WT=0

(7) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0

Example of OCC

Step (7) T3 can Finish, but No Effect Since 175 < 200 - Discard

Page 186: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-186

CSE 4701

T1T2 T3 A B C

200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0

(1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0

(2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0

(3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0

(4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0

(5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0

(6) RT=150 RT=200 RT=175 Write C WT=200 WT=200 WT=0

(7) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0

Summary of Example T1 Completes Successfully; T2 Aborts;

T3 Completes but Doesn’t Write A

Page 187: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-187

CSE 4701

Recovery Consideration Actual Write Operations of Previous Example are

Phase 1 of Two-Phase Commit (Write to Journal) Commit - Phase 2 - Writes to DB Between Write to Log and Write to DB, No Other

Transaction is Allowed to Read Items being Written OCC Reduces Work as Follows:

One Step for Read, Two for Writes (write/commit) In Locking, we had Four Steps for R or W:

Lock, Read or Write, Unlock, Commit

Page 188: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-188

CSE 4701

Viewing OCC vs. Phases of Execution Read Phase:

Database Information Read from Secondary Storage into Primary Memory

All Writes are to Local Workspace Validate Phase:

Check to see if Integrity of Data has not been Violated

Write Phase: Update the DB (Secondary Storage) from Local

Copies

Optimistic execution

ValidateRead Write(and Compute)

Page 189: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-189

CSE 4701

Contrasting PCC and OCC Transaction Control

PCC: Control by Having Transactions Wait OCC: Control by Having Transactions Backed up

Serializability PCC: Ordering of Data Items OCC: Ordering of Transactions

Biggest Potential Problem PCC: Deadlock, rather Preventing it OCC: Starvation

Different Applications Suited to Different Approaches Some DBMS Support Both DBA Can Configure on Application-by-

Application Basis

Page 190: CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Chaps19&20-190

CSE 4701

Concluding Remarks Background

OS Concepts of Sharing and Synchronization Deadlock Detection, Prevention, Avoidance

Chapter 19 Transaction Processing Concepts Different Problems re. Concurrency Control

Deadlock, Livelock, Starvation Lost Update, Dirty Read, etc. Serial Schedule and Serializability

Chapter 20 Deviated from Textbook Notation 3 Pessimistic Locking Based CC Algorithms 1 Optimistic Timestamp Based CC Algorithm Role of Recovery in CC