45
M.P. Johnson, DBMS, Stern/NYU , Sp2004 1 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

C20.0046: Database Management Systems Lecture #8

  • Upload
    faunia

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

C20.0046: Database Management Systems Lecture #8. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Last time: Normalization This time: 4NF Relational Algebra Pep talk OHs today, drop-ins (80809). Normalization Review. Q: What’s required for BCNF? - PowerPoint PPT Presentation

Citation preview

Page 1: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

1

C20.0046: Database Management SystemsLecture #8

Matthew P. Johnson

Stern School of Business, NYU

Spring, 2004

Page 2: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

2

Agenda Last time: Normalization This time:

1. 4NF

2. Relational Algebra Pep talk

OHs today, drop-ins (80809)

Page 3: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

3

Normalization Review Q: What’s required for BCNF?

Q: What are the two types of violations?

Q: What’s the loophole for 3NF?

Q: How do we fix a non-BCNF relation?

Page 4: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

4

Normalization Review Q: If AsBs violates BCNF, what do we do?

Q: In this case, could the decomposition be lossy?

Q: How do we combine two relations?

Q: Can BCNF decomp. lose FDs?

Q: Can 3NF decomp. lose FDs?

Page 5: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

5

New topic: MVDs (3.7) Consider this relation

People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent

Chappaqua333 Some StreetFirst Lady456Hilary

Washington444 Embassy RowFirst Lady456Hilary

New York111 East 60th StreetCEO123Michael

London222 Brompton RoadCEO123Michael

444 Embassy Row

333 Some Street

444 Embassy Row

333 Some Street

222 Brompton Road

111 East 60th Street

Streets

Lawyer

Lawyer

Senator

Senator

Mayor

Mayor

Jobs

Washington456Hilary

Chappaqua789Hilary

Washington789Hilary

Chappaqua456Hilary

London123Michael

New York123Michael

CitysSSNName

Page 6: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

6

Redundancy in BCNF

Lots of redundancy! Key? All fields

None determined by others! Non-trivial FDs? None! In BCNF? Yes!

Name Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

Now what? New concept, leading

to another normal form: Multivalued

dependencies

Page 7: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

7

As Bs if, when As are held fixedvalues in Bs are independent of

values in rest

More precisely: if t1 and t3 agree on As, we then can find t2 such that

t2, t2, t3 agree on As

t2, t1 agree of Bs

t2, t3 agree on Cs

MVD definition

As Bs Cst1

As Bs Cst2

As Bs Cst3

| |

| |

| |

| |

Page 8: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

8

MVD example Claim: name streets,cities If true: can pick arbitrary t1, t3 and find a t2

We pick: first and last of Hilary’s tuples:

Now: if true, can find another Hilary row with street/address of t1 and job of t3

LawyerWashington444 Embassy RowHilary

JobsCitysStreetsName

SenatorChappaqua333 Some StreetHilaryt1

t3

LawyerChappaqua333 Some StreetHilaryt2

Page 9: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

9

MVD example Now: if true, can find another Hilary row with

street/address of t1 and job of t3

Sure enough:

Hilary 333 Some Street Chappaqua Lawyert2

Name Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

t2

Page 10: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

10

MVD rules No splitting rule:

In the example, name streets,cities Do we have name streets?

No: 444 Embassy Row doesn’t go with Chappaqua

NB: City doesn’t determine street – could have >1 house But city, street aren’t independent

Name Streets Citys Jobs

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Lawyer

t1

t3

Page 11: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

11

MVD rules Trivial dependencies:

As Bs iff As BsAi

Transitive rule: As Bs, Bs Cs As Cs

Complementation rule: As Bs As rest Intuition: if each value in Bs is assoc’ed w/each value in

rest, then each value of rest is assoc’ed w/each value in BsName Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Page 12: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

12

MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As Bs

Pick t1, t3 that agree on As.

Must find a t2. Let t2 be t3.

Then1) t2 agrees on As with both

2) t2 agrees on Bs with t1 (why?)

3) t2 agrees on rest with t3 (why?)

QED

Page 13: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

13

Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As Bs is nontrivial if

No Bs are As Some attributes left over (why?)

4NF: for every nontrivial MVD

As Bs, As is a superkey In example name streets,cities, but

name isn’t a superkeyName Streets Citys Jobs

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Lawyer

Page 14: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

14

Decomposition to 4NF Again, analogous to BCNF If we can find As Bs for R where As isn’t

a superkey, replace R with R1(As,Bs) and R2(As,rest)

Running example: name streets,cities People(name,streets,cities,jobs) becomes

Residences(name,street,city) and Employment(name,job)

Page 15: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

15

4NF: another construal In nontrivial As Bs, As must be superkey After df of 4NF, text says: “That is, … every

nontrivial MVD is really a FD with a superkey on the left” (p123).

We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey As everything As Bs the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones

* The typo swapping these was fixed.

Page 16: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

16

Summary of normal forms

Guaranteed to 3NF BCFN 4NF

Eliminate FD redundancy

Mostly Yes Yes

Eliminate MVD redundancy

No No Yes

Preserve FDs Yes No No

Preserve MVDs No No No

Page 17: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

17

Combined isa/weak example Exercise 3.3.1

Convert from E/R to R, by E/R, OO and nulls

courses

Lab-courses

Depts

Computer-allocation

room

number

givenBy

name chair

isa

Page 18: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

18

Next topic: relational algebra (5.1-2) Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations

Page 19: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

19

What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions

Operations Operands: Variables, Constants, expressions

Expressions: Vars & constants Operators applied to expressions

Algebra Vars/consts Operators

High-school Numbers + * - / etc.

Relational Relations (=sets of tupes)

union, intersection, join, etc.

Page 20: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

20

Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the

take The relations these exprs cash out to are the

answers to our questions First proof of RDBMS/RA concept: System R

(1979) Modern implementation of RA: SQL

Page 21: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

21

Relation operators Five basic operators:

Union: Intersection: Difference: - Selection: Projection: Cartesian Product:

Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming:

Page 22: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

22

Operators Relations are sets have set-theoretic ops

Venn diagrams

Union: R1 R2 Example:

ActiveEmployees RetiredEmployees

Difference: R1 – R2 Example:

AllEmployees – RetiredEmployees = ActiveEmployees

Page 23: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

23

Set operations - exampleName Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Ford 345 Palm M 7/7/77

R:

S:

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Ford 345 Palm M 7/7/77

R S:

Page 24: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

24

Set operations - exampleName Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Ford 345 Palm M 7/7/77

R:

S:

R - S: Name Address Gender Birthdate

Hamill 456 Oak M 8/8/88

Page 25: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

25

Operators Intersection: R1 R2 Example:

UnionizedEmployees RetiredEmployees

Intersection can be derived from and – R1 R2 = R1 – (R1 – R2) R1 R2 = -(-R1 -R2) (allowed?)

Page 26: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

26

Set operations - exampleName Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Ford 345 Palm M 7/7/77

R:

S:

R S: Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Page 27: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

27

Operators Selection Selects all tuples satisfying a condition Notation: c(R)

Examples salary > 100000(Employee) name = “Smith”(Employee)

The condition c can have comparison ops:=, <, , >, , <> boolean ops: and, or

Page 28: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

28

Selection example

Select the movies at Angelica: Theater=“Angelica”(Showings)

City of GodVillageFilm Forum

Village

Village

N’hood

Fog of War

City of God

Title

Angelica

Angelica

Theater

Village

Village

N’hood

Fog of War

City of God

Title

Angelica

Angelica

Theater

Page 29: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

29

Operators Projection: op we used for decomposition

Eliminates columns, then removes duplicates

Notation: A1,…,An(R)

Page 30: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

30

Operators Cartesian Product

Cross product Each tuple in R1 combines w/each tuple in R2

Notation: R1 R2

If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A

Fairly rare in practice used to express joins

Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how

large is R1 x R2?

Page 31: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

31

Cartesian product example

Street City

333 Some Street Chappaqua

444 Embassy Row Washington

333 Some Street Chappaqua

Hillary-addresses

Job

Senator

First Lady

Lawyer

Hillary-jobs

Street City Job

333 Some Street Chappaqua Senator

444 Embassy Row Washington Senator

333 Some Street Chappaqua First Lady

444 Embassy Row Washington First Lady

333 Some Street Chappaqua Lawyer

444 Embassy Row Washington Lawyer

Hillary-addresses x Hillary-jobs

Page 32: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

32

Operators Natural join: our join up to now

But always merging shared attributes Notation: R1 ⋈ R2 Meaning:

R1 ⋈ R2 = every att once(shared atts =(R1 R2)) I.e., first compute the cross product R1 x R2

Next, select the rows in which shared fields agree

Finally, project onto the union of R1 and R2’s fields (remove duplicates)

Page 33: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

33

Natural join example

Name Street City

Hilary 333 Some Street Chappaqua

Hilary 444 Embassy Row Washington

Hilary 333 Some Street Chappaqua

Addresses

Name Job

Hilary Senator

Hilary First Lady

Hilary Lawyer

Jobs

Addresses ⋈ JobsName Street City Job

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

Page 34: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

34

Natural Join R S

R ⋈ S= ?

Unpaired tuples called dangling

A B

X Y

X Z

Y Z

Z V

B C

Z U

V W

Z V

Page 35: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

35

Natural Join Given the schemas R(A, B, C, D), S(A, C, E),

what is the schema of R ⋈ S ?

Given R(A, B, C), S(D, E), what is R ⋈ S?

Given R(A, B), S(A, B), what is R ⋈ S?

Page 36: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

36

Theta Join Like natural join, but

includes only rows that satisfy arbitrary condition Does not project away shared attributes

R1 ⋈ R2 = (R1 R2)

Here can be any condition If condition is always satisfies, then theta join

becomes natural join

Page 37: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

37

Theta-join exampleA B C

1 2 3

6 7 8

9 7 8

B C D

2 3 4

2 3 5

7 8 10

A U.B U.C V.B V.C D

1 2 3 2 3 4

1 2 3 2 3 5

1 2 3 7 8 10

6 7 8 7 8 10

9 7 8 7 8 10

U V

U V

A<D

Page 38: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

38

Equijoin A theta join where is an equality R1 ⋈A=B R2 = A=B(R1 R2) = lower-case sigma Example:

Employee ⋈SSN=SSN Dependents

Most useful join in practice

Page 39: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

39

Semijoin R ⋉ S = {atts of R}(R ⋈ S) Q: What does this mean?

Natural join of R and S; Then project onto R’s atts

A: The rows of R for which >1 row in S agree on shared atts

Page 40: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

40

Semijoin example

SSN Name

. . . . . .

DSSN Dname SSN

. . . . . .

EmployeeDependents

network

Employee ⋉ Dependents =

{employees who have dependents}

Employee ⋉ Dependents =

{employees who have dependents}

Page 41: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

41

Renaming Changes the schema, not the instance Notation: B1,…,Bn(R) is spelled “rho”, pronounced “row” Example:

Employee(ssn,name) social, name)(Employee)

Or just: (Employee)

Page 42: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

42

Complex RA Expressions Q: How long was Star Wars (1977)?

Strategy: find the row with Star Wars; then project the length field

Title Year Length inColor Studio Prdcr#

Star Wars 1977 124 True Fox 12345

M.Ducks 1991 104 True Disney 67890

W.World 1992 95 True Paramount 99999

Page 43: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

43

Combining operations Schema: Movies (Title, year, length, filmType, studioName)

Query: select titles and years of movies by Fox that are at least 100 minutes long.

Title Year Length Filmtype StudioStar wars 1977 124 Color Fox

Mighty ducks 1991 104 Color Disney

Wayne’s world 1992 85 Color Paramount

Page 44: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

44

Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names Clients.name(Reps.name=George(Reps.ssn=rssn(

Reps x Clients))) Or: Clients.name(Reps.name=George and Reps.ssn=rssn(Reps x

Clients)) Or: Clients.name(Reps.name=George(Reps x Clients)

Reps.ssn=rssn(Reps x Clients))

Page 45: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Sp2004

45

For next time Finish chapter 5 Come to office hours!