Upload
faunia
View
37
Download
0
Embed Size (px)
DESCRIPTION
C20.0046: Database Management Systems Lecture #8. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Last time: Normalization This time: 4NF Relational Algebra Pep talk OHs today, drop-ins (80809). Normalization Review. Q: What’s required for BCNF? - PowerPoint PPT Presentation
Citation preview
M.P. Johnson, DBMS, Stern/NYU, Sp2004
1
C20.0046: Database Management SystemsLecture #8
Matthew P. Johnson
Stern School of Business, NYU
Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp2004
2
Agenda Last time: Normalization This time:
1. 4NF
2. Relational Algebra Pep talk
OHs today, drop-ins (80809)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
3
Normalization Review Q: What’s required for BCNF?
Q: What are the two types of violations?
Q: What’s the loophole for 3NF?
Q: How do we fix a non-BCNF relation?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
4
Normalization Review Q: If AsBs violates BCNF, what do we do?
Q: In this case, could the decomposition be lossy?
Q: How do we combine two relations?
Q: Can BCNF decomp. lose FDs?
Q: Can 3NF decomp. lose FDs?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
5
New topic: MVDs (3.7) Consider this relation
People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent
Chappaqua333 Some StreetFirst Lady456Hilary
Washington444 Embassy RowFirst Lady456Hilary
New York111 East 60th StreetCEO123Michael
London222 Brompton RoadCEO123Michael
444 Embassy Row
333 Some Street
444 Embassy Row
333 Some Street
222 Brompton Road
111 East 60th Street
Streets
Lawyer
Lawyer
Senator
Senator
Mayor
Mayor
Jobs
Washington456Hilary
Chappaqua789Hilary
Washington789Hilary
Chappaqua456Hilary
London123Michael
New York123Michael
CitysSSNName
M.P. Johnson, DBMS, Stern/NYU, Sp2004
6
Redundancy in BCNF
Lots of redundancy! Key? All fields
None determined by others! Non-trivial FDs? None! In BCNF? Yes!
Name Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
Now what? New concept, leading
to another normal form: Multivalued
dependencies
M.P. Johnson, DBMS, Stern/NYU, Sp2004
7
As Bs if, when As are held fixedvalues in Bs are independent of
values in rest
More precisely: if t1 and t3 agree on As, we then can find t2 such that
t2, t2, t3 agree on As
t2, t1 agree of Bs
t2, t3 agree on Cs
MVD definition
As Bs Cst1
As Bs Cst2
As Bs Cst3
| |
| |
| |
| |
M.P. Johnson, DBMS, Stern/NYU, Sp2004
8
MVD example Claim: name streets,cities If true: can pick arbitrary t1, t3 and find a t2
We pick: first and last of Hilary’s tuples:
Now: if true, can find another Hilary row with street/address of t1 and job of t3
LawyerWashington444 Embassy RowHilary
JobsCitysStreetsName
SenatorChappaqua333 Some StreetHilaryt1
t3
LawyerChappaqua333 Some StreetHilaryt2
M.P. Johnson, DBMS, Stern/NYU, Sp2004
9
MVD example Now: if true, can find another Hilary row with
street/address of t1 and job of t3
Sure enough:
Hilary 333 Some Street Chappaqua Lawyert2
Name Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
t2
M.P. Johnson, DBMS, Stern/NYU, Sp2004
10
MVD rules No splitting rule:
In the example, name streets,cities Do we have name streets?
No: 444 Embassy Row doesn’t go with Chappaqua
NB: City doesn’t determine street – could have >1 house But city, street aren’t independent
Name Streets Citys Jobs
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Lawyer
t1
t3
M.P. Johnson, DBMS, Stern/NYU, Sp2004
11
MVD rules Trivial dependencies:
As Bs iff As BsAi
Transitive rule: As Bs, Bs Cs As Cs
Complementation rule: As Bs As rest Intuition: if each value in Bs is assoc’ed w/each value in
rest, then each value of rest is assoc’ed w/each value in BsName Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
M.P. Johnson, DBMS, Stern/NYU, Sp2004
12
MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As Bs
Pick t1, t3 that agree on As.
Must find a t2. Let t2 be t3.
Then1) t2 agrees on As with both
2) t2 agrees on Bs with t1 (why?)
3) t2 agrees on rest with t3 (why?)
QED
M.P. Johnson, DBMS, Stern/NYU, Sp2004
13
Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As Bs is nontrivial if
No Bs are As Some attributes left over (why?)
4NF: for every nontrivial MVD
As Bs, As is a superkey In example name streets,cities, but
name isn’t a superkeyName Streets Citys Jobs
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Lawyer
M.P. Johnson, DBMS, Stern/NYU, Sp2004
14
Decomposition to 4NF Again, analogous to BCNF If we can find As Bs for R where As isn’t
a superkey, replace R with R1(As,Bs) and R2(As,rest)
Running example: name streets,cities People(name,streets,cities,jobs) becomes
Residences(name,street,city) and Employment(name,job)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
15
4NF: another construal In nontrivial As Bs, As must be superkey After df of 4NF, text says: “That is, … every
nontrivial MVD is really a FD with a superkey on the left” (p123).
We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey As everything As Bs the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones
* The typo swapping these was fixed.
M.P. Johnson, DBMS, Stern/NYU, Sp2004
16
Summary of normal forms
Guaranteed to 3NF BCFN 4NF
Eliminate FD redundancy
Mostly Yes Yes
Eliminate MVD redundancy
No No Yes
Preserve FDs Yes No No
Preserve MVDs No No No
M.P. Johnson, DBMS, Stern/NYU, Sp2004
17
Combined isa/weak example Exercise 3.3.1
Convert from E/R to R, by E/R, OO and nulls
courses
Lab-courses
Depts
Computer-allocation
room
number
givenBy
name chair
isa
M.P. Johnson, DBMS, Stern/NYU, Sp2004
18
Next topic: relational algebra (5.1-2) Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations
M.P. Johnson, DBMS, Stern/NYU, Sp2004
19
What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions
Operations Operands: Variables, Constants, expressions
Expressions: Vars & constants Operators applied to expressions
Algebra Vars/consts Operators
High-school Numbers + * - / etc.
Relational Relations (=sets of tupes)
union, intersection, join, etc.
M.P. Johnson, DBMS, Stern/NYU, Sp2004
20
Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the
take The relations these exprs cash out to are the
answers to our questions First proof of RDBMS/RA concept: System R
(1979) Modern implementation of RA: SQL
M.P. Johnson, DBMS, Stern/NYU, Sp2004
21
Relation operators Five basic operators:
Union: Intersection: Difference: - Selection: Projection: Cartesian Product:
Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming:
M.P. Johnson, DBMS, Stern/NYU, Sp2004
22
Operators Relations are sets have set-theoretic ops
Venn diagrams
Union: R1 R2 Example:
ActiveEmployees RetiredEmployees
Difference: R1 – R2 Example:
AllEmployees – RetiredEmployees = ActiveEmployees
M.P. Johnson, DBMS, Stern/NYU, Sp2004
23
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Ford 345 Palm M 7/7/77
R S:
M.P. Johnson, DBMS, Stern/NYU, Sp2004
24
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
R - S: Name Address Gender Birthdate
Hamill 456 Oak M 8/8/88
M.P. Johnson, DBMS, Stern/NYU, Sp2004
25
Operators Intersection: R1 R2 Example:
UnionizedEmployees RetiredEmployees
Intersection can be derived from and – R1 R2 = R1 – (R1 – R2) R1 R2 = -(-R1 -R2) (allowed?)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
26
Set operations - exampleName Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Hamill 456 Oak M 8/8/88
Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
Ford 345 Palm M 7/7/77
R:
S:
R S: Name Address Gender Birthdate
Fisher 123 Maple F 9/9/99
M.P. Johnson, DBMS, Stern/NYU, Sp2004
27
Operators Selection Selects all tuples satisfying a condition Notation: c(R)
Examples salary > 100000(Employee) name = “Smith”(Employee)
The condition c can have comparison ops:=, <, , >, , <> boolean ops: and, or
M.P. Johnson, DBMS, Stern/NYU, Sp2004
28
Selection example
Select the movies at Angelica: Theater=“Angelica”(Showings)
City of GodVillageFilm Forum
Village
Village
N’hood
Fog of War
City of God
Title
Angelica
Angelica
Theater
Village
Village
N’hood
Fog of War
City of God
Title
Angelica
Angelica
Theater
M.P. Johnson, DBMS, Stern/NYU, Sp2004
29
Operators Projection: op we used for decomposition
Eliminates columns, then removes duplicates
Notation: A1,…,An(R)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
30
Operators Cartesian Product
Cross product Each tuple in R1 combines w/each tuple in R2
Notation: R1 R2
If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A
Fairly rare in practice used to express joins
Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how
large is R1 x R2?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
31
Cartesian product example
Street City
333 Some Street Chappaqua
444 Embassy Row Washington
333 Some Street Chappaqua
Hillary-addresses
Job
Senator
First Lady
Lawyer
Hillary-jobs
Street City Job
333 Some Street Chappaqua Senator
444 Embassy Row Washington Senator
333 Some Street Chappaqua First Lady
444 Embassy Row Washington First Lady
333 Some Street Chappaqua Lawyer
444 Embassy Row Washington Lawyer
Hillary-addresses x Hillary-jobs
M.P. Johnson, DBMS, Stern/NYU, Sp2004
32
Operators Natural join: our join up to now
But always merging shared attributes Notation: R1 ⋈ R2 Meaning:
R1 ⋈ R2 = every att once(shared atts =(R1 R2)) I.e., first compute the cross product R1 x R2
Next, select the rows in which shared fields agree
Finally, project onto the union of R1 and R2’s fields (remove duplicates)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
33
Natural join example
Name Street City
Hilary 333 Some Street Chappaqua
Hilary 444 Embassy Row Washington
Hilary 333 Some Street Chappaqua
Addresses
Name Job
Hilary Senator
Hilary First Lady
Hilary Lawyer
Jobs
Addresses ⋈ JobsName Street City Job
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
M.P. Johnson, DBMS, Stern/NYU, Sp2004
34
Natural Join R S
R ⋈ S= ?
Unpaired tuples called dangling
A B
X Y
X Z
Y Z
Z V
B C
Z U
V W
Z V
M.P. Johnson, DBMS, Stern/NYU, Sp2004
35
Natural Join Given the schemas R(A, B, C, D), S(A, C, E),
what is the schema of R ⋈ S ?
Given R(A, B, C), S(D, E), what is R ⋈ S?
Given R(A, B), S(A, B), what is R ⋈ S?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
36
Theta Join Like natural join, but
includes only rows that satisfy arbitrary condition Does not project away shared attributes
R1 ⋈ R2 = (R1 R2)
Here can be any condition If condition is always satisfies, then theta join
becomes natural join
M.P. Johnson, DBMS, Stern/NYU, Sp2004
37
Theta-join exampleA B C
1 2 3
6 7 8
9 7 8
B C D
2 3 4
2 3 5
7 8 10
A U.B U.C V.B V.C D
1 2 3 2 3 4
1 2 3 2 3 5
1 2 3 7 8 10
6 7 8 7 8 10
9 7 8 7 8 10
U V
U V
A<D
M.P. Johnson, DBMS, Stern/NYU, Sp2004
38
Equijoin A theta join where is an equality R1 ⋈A=B R2 = A=B(R1 R2) = lower-case sigma Example:
Employee ⋈SSN=SSN Dependents
Most useful join in practice
M.P. Johnson, DBMS, Stern/NYU, Sp2004
39
Semijoin R ⋉ S = {atts of R}(R ⋈ S) Q: What does this mean?
Natural join of R and S; Then project onto R’s atts
A: The rows of R for which >1 row in S agree on shared atts
M.P. Johnson, DBMS, Stern/NYU, Sp2004
40
Semijoin example
SSN Name
. . . . . .
DSSN Dname SSN
. . . . . .
EmployeeDependents
network
Employee ⋉ Dependents =
{employees who have dependents}
Employee ⋉ Dependents =
{employees who have dependents}
M.P. Johnson, DBMS, Stern/NYU, Sp2004
41
Renaming Changes the schema, not the instance Notation: B1,…,Bn(R) is spelled “rho”, pronounced “row” Example:
Employee(ssn,name) social, name)(Employee)
Or just: (Employee)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
42
Complex RA Expressions Q: How long was Star Wars (1977)?
Strategy: find the row with Star Wars; then project the length field
Title Year Length inColor Studio Prdcr#
Star Wars 1977 124 True Fox 12345
M.Ducks 1991 104 True Disney 67890
W.World 1992 95 True Paramount 99999
M.P. Johnson, DBMS, Stern/NYU, Sp2004
43
Combining operations Schema: Movies (Title, year, length, filmType, studioName)
Query: select titles and years of movies by Fox that are at least 100 minutes long.
Title Year Length Filmtype StudioStar wars 1977 124 Color Fox
Mighty ducks 1991 104 Color Disney
Wayne’s world 1992 85 Color Paramount
M.P. Johnson, DBMS, Stern/NYU, Sp2004
44
Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names Clients.name(Reps.name=George(Reps.ssn=rssn(
Reps x Clients))) Or: Clients.name(Reps.name=George and Reps.ssn=rssn(Reps x
Clients)) Or: Clients.name(Reps.name=George(Reps x Clients)
Reps.ssn=rssn(Reps x Clients))
M.P. Johnson, DBMS, Stern/NYU, Sp2004
45
For next time Finish chapter 5 Come to office hours!