Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
TheRelationalModel
Lecture18
Today’sLecture
1. TheRelationalModel&RelationalAlgebra
2. RelationalAlgebraPt.II
2
Lecture18
1.TheRelationalModel&RelationalAlgebra
3
Lecture18>Section1
Whatyouwilllearnaboutinthissection
1. TheRelationalModel
2. RelationalAlgebra:BasicOperators
3. Execution
4. ACTIVITY:FromSQLtoRA&Back
4
Lecture18>Section1
Motivation
TheRelationalmodelisprecise,implementable,andwecanoperateonit
(query/update,etc.)
Databasemapsinternallyintothisprocedurallanguage.
Lecture18>Section1>TheRelationalModel
ALittleHistory
• RelationalmodelduetoEdgar“Ted”Codd,amathematicianatIBMin1970• ARelationalModelofDataforLargeSharedDataBanks". CommunicationsoftheACM 13 (6):377–387
• IBMdidn’twanttouserelationalmodel(takemoneyfromtheirInformationManagementSystem)
WonTuringaward1981
Lecture18>Section1>TheRelationalModel
TheRelationalModel:Schemata
• RelationalSchema:
Lecture18>Section1>TheRelationalModel
Students(sid: string, name: string, gpa: float)
AttributesString,float,int,etc.arethedomains oftheattributes
Relationname
8
TheRelationalModel:Data
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Student
Anattribute (orcolumn)isatypeddataentrypresentineachtupleintherelation
Thenumberofattributesisthearity oftherelation
Lecture18>Section1>TheRelationalModel
9
TheRelationalModel:Data
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Student
Atuple orrow (orrecord)isasingleentryinthetablehavingtheattributesspecifiedbytheschema
Thenumberoftuplesisthecardinality oftherelation
Lecture18>Section1>TheRelationalModel
10
TheRelationalModel:DataStudent
Arelationalinstance isaset oftuplesallconformingtothesameschema
Recall:InpracticeDBMSsrelaxthesetrequirement,andusemultisets.
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Lecture18>Section1>TheRelationalModel
• Arelationalschema describesthedatathatiscontainedinarelationalinstance
ToReiterate
LetR(f1:Dom1,…,fm:Domm)bearelationalschema then,aninstanceofRisasubsetofDom1 xDom2 x…xDomn
Lecture18>Section1>TheRelationalModel
Inthisway,arelationalschema Risatotalfunctionfromattributenames totypes
• Arelationalschema describesthedatathatiscontainedinarelationalinstance
OneMoreTime
ArelationRofarity t isafunction:R:Dom1 x…xDomt à {0,1}
Lecture18>Section1>TheRelationalModel
Then,theschemaissimplythesignatureofthefunction
I.e.returnswhetherornotatupleofmatchingtypesisamemberofit
Noteherethatordermatters,attributenamedoesn’t…We’ll(mostly)workwiththeothermodel(lastslide)in
whichattributenamematters,orderdoesn’t!
Arelationaldatabase
• Arelationaldatabaseschema isasetofrelationalschemata,oneforeachrelation
• Arelationaldatabaseinstance isasetofrelationalinstances,oneforeachrelation
Twoconventions:1. Wecallrelationaldatabaseinstancesassimplydatabases2. Weassumeallinstancesarevalid,i.e.,satisfythedomainconstraints
Lecture18>Section1>TheRelationalModel
RemembertheCMS
• RelationDBSchema• Students(sid:string,name:string,gpa:float)• Courses(cid:string,cname:string,credits:int)• Enrolled(sid:string,cid:string,grade:string)
Sid Name Gpa101 Bob 3.2123 Mary 3.8
Students
cid cname credits564 564-2 4308 417 2
Coursessid cid Grade123 564 A
Enrolled
RelationInstances
14
Lecture18>Section1>TheRelationalModel
Notethattheschemasimposeeffectivedomain/typeconstraints,i.e.Gpacan’tbe“Apple”
2nd PartoftheModel:Querying
“FindnamesofallstudentswithGPA>3.5”
Wedon’ttellthesystem howorwhere togetthedata- justwhatwewant,i.e.,Queryingisdeclarative
Actually,Ishowedhowtodothistranslationforamuchricherlanguage!
Lecture18>Section1>TheRelationalModel
SELECT S.nameFROM Students SWHERE S.gpa > 3.5;
Tomakethishappen,weneedtotranslatethedeclarativequeryintoaseriesofoperators…we’llseethisnext!
Virtuesofthemodel
• Physicalindependence(logicaltoo),Declarative
• Simple,elegantclean:Everythingisarelation
• Whydidittakemultipleyears?• Doubteditcouldbedoneefficiently.
Lecture18>Section1>TheRelationalModel
RelationalAlgebra
Lecture18>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Declarativequery(fromuser)
Translatetorelationalalgebraexpresson
Findlogicallyequivalent- butmoreefficient- RAexpression
Executeeachoperatoroftheoptimizedplan!
Lecture18>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!
Lecture18>Section1 >RelationalAlgebra
• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division
RelationalAlgebra(RA)
We’lllookatthesefirst!
Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)
Lecture18>Section1 >RelationalAlgebra
Keepinmind:RAoperatesonsets!
• RDBMSsusemultisets,howeverinrelationalalgebraformalismwewillconsidersets!
• Also:wewillconsiderthenamedperspective,whereeveryattributemusthaveauniquename• àattributeorderdoesnotmatter…
Lecture18>Section1 >RelationalAlgebra
NowontothebasicRAoperators…
• Returnsalltupleswhichsatisfyacondition• Notation: sc(R)• Examples• sSalary >40000 (Employee)• sname =“Smith” (Employee)
• Theconditionccanbe=,<,£,>,³,<>
1.Selection(𝜎)
SELECT *FROM StudentsWHERE gpa > 3.5;
SQL:
RA:𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Lecture18>Section1 >RelationalAlgebra
sSalary >40000 (Employee)
SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000
SSN Name Salary5423341 Smith 6000004352342 Fred 500000
Anotherexample:
Lecture18>Section1 >RelationalAlgebra
• Eliminatescolumns,thenremovesduplicates• Notation:P A1,…,An (R)• Example:projectsocial-securitynumberandnames:• P SSN,Name (Employee)• Outputschema:Answer(SSN,Name)
2.Projection(Π)
SELECT DISTINCTsname,gpa
FROM Students;
SQL:
RA:Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Lecture18>Section1 >RelationalAlgebra
P Name,Salary (Employee)
SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000
Name SalaryJohn 200000John 600000
Anotherexample:
Lecture18>Section1 >RelationalAlgebra
NotethatRAOperatorsareCompositional!
SELECT DISTINCTsname,gpa
FROM StudentsWHERE gpa > 3.5;
Students(sid,sname,gpa)
HowdowerepresentthisqueryinRA?
Π45$67,"#$(𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
𝜎"#$&'.)(Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
Aretheselogicallyequivalent?
Lecture18>Section1 >RelationalAlgebra
• EachtupleinR1witheachtupleinR2• Notation:R1´ R2• Example:• Employee´ Dependents
• Rareinpractice;mainlyusedtoexpressjoins
3.Cross-Product(×)
SELECT *FROM Students, People;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
Lecture18>Section1 >RelationalAlgebra
ssn pname address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid sname gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
×
ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4
5423341 Bob 217 Rosse 001 John 3.4
1234545 John 216 Rosse 002 Bob 1.3
5423341 Bob 216 Rosse 002 Bob 1.3
People StudentsAnotherexample:
Lecture18>Section1 >RelationalAlgebra
• Changestheschema,nottheinstance• A‘special’operator- neitherbasicnorderived• Notation:r B1,…,Bn (R)
• Note:thisisshorthandfortheproperform(sincenames,notordermatters!):• r A1àB1,…,AnàBn (R)
Renaming(𝜌)
SELECTsid AS studId,sname AS name,gpa AS gradePtAvg
FROM Students;
SQL:
RA:𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Wecareaboutthisoperatorbecause weareworkinginanamedperspective
Lecture18>Section1 >RelationalAlgebra
sid sname gpa001 John 3.4
002 Bob 1.3
𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students
studId name gradePtAvg001 John 3.4
002 Bob 1.3
Students
Anotherexample:
Lecture18>Section1 >RelationalAlgebra
• Notation:R1⋈R2
• JoinsR1 andR2 onequalityofallsharedattributes• IfR1 hasattributesetA,andR2 hasattributesetB,andtheyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2
• OurfirstexampleofaderivedRA operator:• Meaning:R1⋈ R2 =PAUB(sC=D(𝜌J→L(R1)´ R2))• Where:
• Therename𝜌J→L renamesthesharedattributesinoneoftherelations
• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicate
commonattributes
NaturalJoin(⋈)
SELECT DISTINCTssid, S.name, gpa,ssn, address
FROM Students S,People P
WHERE S.name = P.name;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,name,gpa)People(ssn,name,address)
Lecture18>Section1 >RelationalAlgebra
ssn P.name address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid S.name gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
⋈
sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse
002 Bob 1.3 5423341 216 Rosse
PeoplePStudentsSAnotherexample:
Lecture18>Section1 >RelationalAlgebra
NaturalJoin
• GivenschemasR(A,B,C,D),S(A,C,E),whatistheschemaofR⋈S?
• GivenR(A,B,C),S(D,E),whatisR⋈S?
• GivenR(A,B),S(A,B),whatisR⋈S?
Lecture18>Section1 >RelationalAlgebra
Example:ConvertingSFWQuery->RA
SELECT DISTINCTgpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDsname = pname;
HowdowerepresentthisqueryinRA?
Π"#$,$AAC744(𝜎"#$&'.)(𝑆 ⋈ 𝑃))
Lecture18>Section1 >RelationalAlgebra
Students(sid,sname,gpa)People(ssn,sname,address)
LogicalEquivalece ofRAPlans
• GivenrelationsR(A,B)andS(B,C):
• Here,projection&selectioncommute:• 𝜎EM)(ΠE(𝑅)) = ΠE(𝜎EM)(𝑅))
• Whatabouthere?• 𝜎EM)(ΠP(𝑅))?= ΠP(𝜎EM)(𝑅))
We’lllookatthisinmoredepthlaterinthelecture…
Lecture18>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
WesawhowwecantransformdeclarativeSQLqueriesintoprecise,compositionalRAplans
Lecture18>Section1 >RelationalAlgebra
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
We’lllookathowtothenoptimizetheseplanslaterinthisclass
Lecture18>Section1 >RelationalAlgebra
RDBMSArchitecture
HowistheRA“plan”executed?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Wealreadyknowhowtoexecuteallthebasicoperators!
Lecture18>Section1 >RelationalAlgebra
RAPlanExecution
• NaturalJoin/Join:• Wesawhowtousememory&IOcostconsiderationstopickthecorrectalgorithmtoexecuteajoin with(BNLJ,SMJ,HJ…)!
• Selection:• Wesawhowtouseindexestoaidselection• Canalwaysfallbackonscan/binarysearchaswell
• Projection:• Themainoperationhereisfindingdistinctvaluesoftheprojecttuples;webrieflydiscussedhowtodothiswithe.g.hashingorsorting
Wealreadyknowhowtoexecuteallthebasicoperators!
Lecture18>Section1 >RelationalAlgebra
Activity-16-1.ipynb
40
Lecture18>Section1>ACTIVITY
2.Adv.RelationalAlgebra
41
Lecture18>Section2
Whatyouwilllearnaboutinthissection
1. SetOperationsinRA
2. FancierRA
3. Extensions&Limitations
42
Lecture18>Section2
• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division
RelationalAlgebra(RA)
We’lllookatthese
Andalsoatsomeofthesederivedoperators
Lecture18>Section2
1.Union(È) and2.Difference(–)
• R1È R2• Example:• ActiveEmployeesÈ RetiredEmployees
• R1– R2• Example:• AllEmployees -- RetiredEmployees
R1 R2
R1 R2
Lecture18>Section2 >SetOperations
WhataboutIntersection(Ç) ?
• Itisaderivedoperator• R1Ç R2=R1– (R1– R2)• Alsoexpressedasajoin!• Example
• UnionizedEmployeesÇ RetiredEmployees
R1 R2
Lecture18>Section2 >SetOperations
FancierRA
Lecture18>Section2>FancierRA
ThetaJoin(⋈q)
• Ajointhatinvolvesapredicate• R1⋈q R2=s q (R1´ R2)• Hereq canbeanycondition
SELECT *FROM
Students,PeopleWHERE q;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈R 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
Notethatnaturaljoinisathetajoin+aprojection.
Lecture18>Section2>FancierRA
Equi-join(⋈A=B)
• Athetajoinwhereq isanequality• R1⋈A=B R2=s A=B (R1´ R2)• Example:• Employee⋈SSN=SSN Dependents
SELECT *FROM
Students S,People P
WHERE sname = pname;
SQL:
RA:𝑆 ⋈45$67M#5$67 𝑃
Students(sid,sname,gpa)People(ssn,pname,address)
Mostcommonjoininpractice!
Lecture18>Section2>FancierRA
Semijoin (⋉)
• R⋉ S=P A1,…,An (R⋈ S)• WhereA1,…,An aretheattributesinR• Example:• Employee⋉Dependents
SELECT DISTINCTsid,sname,gpa
FROM Students,People
WHEREsname = pname;
SQL:
RA:
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋉ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
Lecture18>Section2>FancierRA
SemijoinsinDistributedDatabases• Semijoins areoftenusedtocomputenaturaljoinsindistributeddatabases
SSN Name. . . . . .
SSN Dname Age. . . . . .
Employee
Dependents
network
Employee ⋈ssn=ssn (s age>71 (Dependents))
T = P SSN s age>71 (Dependents)R = Employee ⋉T
Answer = R ⋈Dependents
Sendlessdatatoreducenetworkbandwidth!
Lecture18>Section2>FancierRA
RAExpressionsCanGetComplex!
PersonPurchasePersonProduct
sname=fred sname=gizmo
P pidP ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
P name
Lecture18>Section2>FancierRA
Multisets
Lecture18>Section2>Extensions&Limitations
RecallthatSQLusesMultisets
53
Tuple
(1,a)
(1,a)
(1, b)
(2,c)
(2,c)
(2,c)
(1,d)
(1,d)
Tuple 𝝀(𝑿)
(1,a) 2
(1,b) 1
(2,c) 3
(1, d) 2EquivalentRepresentationsofaMultiset
Multiset X
Multiset X
Note:Inasetallcountsare{0,1}.
𝝀 𝑿 =“CountoftupleinX”(Itemsnotlistedhaveimplicitcount0)
Lecture18>Section2>Extensions&Limitations
GeneralizingSetOperationstoMultisetOperations
54
Tuple 𝝀(𝑿)
(1,a) 2
(1,b) 0
(2,c) 3
(1, d) 0
Multiset X
Tuple 𝝀(𝒀)
(1,a) 5
(1,b) 1
(2,c) 2
(1, d) 2
Multiset Y
Tuple 𝝀(𝒁)
(1,a) 2
(1,b) 0
(2,c) 2
(1, d) 0
Multiset Z
∩ =
𝝀 𝒁 = 𝒎𝒊𝒏(𝝀 𝑿 , 𝝀 𝒀 )Forsets,thisisintersection
Lecture18>Section2>Extensions&Limitations
55
Tuple 𝝀(𝑿)
(1,a) 2
(1,b) 0
(2,c) 3
(1, d) 0
Multiset X
Tuple 𝝀(𝒀)
(1,a) 5
(1,b) 1
(2,c) 2
(1, d) 2
Multiset Y
Tuple 𝝀(𝒁)
(1,a) 7
(1,b) 1
(2,c) 5
(1, d) 2
Multiset Z
∪ =
𝝀 𝒁 = 𝝀 𝑿 + 𝝀 𝒀Forsets,
thisisunion
GeneralizingSetOperationstoMultisetOperations
Lecture18>Section2>Extensions&Limitations
OperationsonMultisets
AllRAoperationsneedtobedefinedcarefullyonbags
• sC(R):preservethenumberofoccurrences
• PA(R):noduplicateelimination
• Cross-product,join:noduplicateelimination
Thisisimportant- relationalenginesworkonmultisets,notsets!
Lecture18>Section2>Extensions&Limitations
RAhasLimitations!
• Cannotcompute“transitiveclosure”
• FindalldirectandindirectrelativesofFred• CannotexpressinRA!!!
• NeedtowriteCprogram,useagraphengine,ormodernSQL…
Name1 Name2 RelationshipFred Mary FatherMary Joe CousinMary Bill SpouseNancy Lou Sister
Lecture18>Section2>Extensions&Limitations