256
L22:TheRelationalModel(continued)
CS3200 Databasedesign(sp18 s2)https://course.ccs.neu.edu/cs3200sp18s2/4/5/2018
257
Announcements!
• Pleasepickupyourexamifyouhavenotyet• HW6 willincludeRedis andMongoDBexerciseswithminimalinstalloverhead(stillinpreparation)
• Finalclasscalendar• Outlinetoday- Relationalalgebra- Optimization- NoSQL(start,continuingnexttime)
258
259
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Declarativequery(fromuser)
Translatetorelationalalgebraexpresson
Findlogicallyequivalent- butmoreefficient- RAexpression
Executeeachoperatoroftheoptimizedplan!
260
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!
261
RelationalAlgebra(RA)
• Fivebasicoperators:1. Selection:s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:- Intersection,complement- Joins(natural,equi-join,thetajoin,semi-join)- Renaming:r- Division
We’lllookatthesefirst!
Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)
262
Keepinmind:RAoperatesonsets!
• RDBMSsusemultisets,howeverinrelationalalgebraformalismwewillconsidersets!
• Also:wewillconsiderthenamedperspective,whereeveryattributemusthaveauniquename- àattributeorderdoesnotmatter…
NowontothebasicRAoperators…
263
1.Selection(𝜎)• Returnsalltupleswhichsatisfyacondition• Notation:sc (R)• Examples- sSalary >40000 (Employee)- sname =“Smith” (Employee)
• Theconditionccanbe=,<,£,>, ³,<>
SELECT *FROM StudentsWHERE gpa > 3.5;
SQL:
RA:𝜎{|}~�.w(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
264
sSalary > 40000 (Employee)
SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000
SSN Name Salary5423341 Smith 6000004352342 Fred 500000
Anotherexample:
265
2.Projection(Π)
• Eliminatescolumns,thenremovesduplicates
• Notation:P A1,…,An (R)• Example:projectsocial-securitynumberandnames:- P SSN,Name (Employee)- Outputschema:Answer(SSN,Name)
SELECT DISTINCTsname,gpa
FROM Students;
SQL:
RA:Π�&}(�,{|}(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
266
P SSN (Employee)
SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000
Anotherexample:
SSN123454554233414352342
267
P Name,Salary (Employee)
SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000
Name SalaryJohn 200000John 600000
Anotherexample:
268
NotethatRAOperatorsareCompositional!
SELECT DISTINCTsname,gpa
FROM StudentsWHERE gpa > 3.5;
Students(sid,sname,gpa)
HowdowerepresentthisqueryinRA?
Π�&}(�,{|}(𝜎{|}~�.w(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
𝜎{|}~�.w(Π�&}(�,{|}(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
Aretheselogicallyequivalent?
269
3.Cross-Product(×)
• EachtupleinR1witheachtupleinR2
• Notation:R1´ R2• Example:- Employee´ Dependents
• Rareinpractice;mainlyusedtoexpressjoins
SELECT *FROM Students, People;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
270
ssn pname address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid sname gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
×
ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4
5423341 Bob 217 Rosse 001 John 3.4
1234545 John 216 Rosse 002 Bob 1.3
5423341 Bob 216 Rosse 002 Bob 1.3
People StudentsAnotherexample:
271
4.Renaming(𝜌)
• Changestheschema,nottheinstance
• A‘special’operator- neitherbasicnorderived
• Notation:r B1,…,Bn (R)
• Note:thisisshorthand fortheproperform(sincenames,notorder matters!):- r A1àB1,…,AnàBn (R)
SELECTsid AS studId,sname AS name,gpa AS gradePtAvg
FROM Students;
SQL:
RA:𝜌������,&}(�,{�}��k���{(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Wecareaboutthisoperatorbecause weareworkinginanamedperspective
272
sid sname gpa001 John 3.4
002 Bob 1.3
𝜌������,&}(�,{�}��k���{(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students
studId name gradePtAvg001 John 3.4
002 Bob 1.3
Students
Anotherexample:
273
5.NaturalJoin(⋈)
• Notation:R1⋈R2
• JoinsR1 andR2 onequalityofallsharedattributes- IfR1 hasattributesetA,andR2 hasattributesetB,and
theyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2
• OurfirstexampleofaderivedRAoperator:- Meaning:R1⋈ R2 =PAUB(sC=D(𝜌�→�(R1)´ R2))- Where:
• Therename𝜌�→� renamesthesharedattributesinoneoftherelations
• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicatecommon
attributes
SELECT DISTINCTssid, S.name, gpa,ssn, address
FROM Students S,People P
WHERE S.name = P.name;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,name,gpa)People(ssn,name,address)
274
ssn P.name address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid S.name gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
⋈
sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse
002 Bob 1.3 5423341 216 Rosse
People PStudents SAnotherexample:
275
NaturalJoinpractice
• GivenschemasR(A,B,C,D),S(A,C,E),whatistheschemaofR⋈S?
• GivenR(A,B,C),S(D,E),whatisR⋈S?
• GivenR(A,B),S(A,B),whatisR⋈S?
276
Example:ConvertingSFWQuery->RA
SELECT DISTINCTgpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDsname = pname;
HowdowerepresentthisqueryinRA?
Π{|},}������(𝜎{|}~�.w(𝑆 ⋈ 𝑃))
Students(sid,sname,gpa)People(ssn,sname,address)
277
LogicalEquivaleceofRAPlans
• GivenrelationsR(A,B)andS(B,C):
- Here,projection&selectioncommute:• 𝜎�Sw(Π�(𝑅)) = Π�(𝜎�Sw(𝑅))
- Whatabouthere?• 𝜎�Sw(Π�(𝑅))? = Π�(𝜎�Sw(𝑅))
We’lllookatthisinmoredepthinqueryoptimization…
278
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
WesawhowwecantransformdeclarativeSQLqueriesintoprecise,compositionalRAplans
279
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
We’lllookathowtothenoptimizetheseplans
280
RDBMSArchitecture
• HowistheRA“plan”executed?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Wehavealreadyseenhowtoexecuteafewbasicoperators!
281
RAPlanExecution
• NaturalJoin/Join:- Wesawhowtousememory&IOcostconsiderations topickthecorrectalgorithmto
executeajoinwithBNLJ orSMJ (weskippedHJ)
• Selection:- Wesawhowtouseindexestoaidselection- Canalwaysfallbackonscan/binarysearchaswell
• Projection:- Themainoperationhereisfindingdistinctvaluesoftheprojecttuples;webriefly
discussedhowtodothiswithsorting (weskippedhashing)
Wealreadyknowhowtoexecuteallthebasicoperators!
282
3.AdvancedRelationalAlgebra(verybrief)
283
Whatwewillbrieflycovernext
• SetOperationsinRA
• Extensions&Limitations
284
RelationalAlgebra(RA)
• Fivebasicoperators:1. Selection:s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:- Intersection,complement- Joins(natural,equi-join,thetajoin,semi-join)- Renaming:P- Division
We’lllookatthese
285
1.Union(È) and2.Difference(–)
• R1È R2• Example:- ActiveEmployeesÈ RetiredEmployees
• R1– R2• Example:- AllEmployees-- RetiredEmployees
R1 R2
R1 R2
286
WhataboutIntersection(Ç) ?
• Itisaderivedoperator• R1Ç R2 =R1 – (R1 – R2)• Alsoexpressedasajoin!• Example- UnionizedEmployeesÇ RetiredEmployees
R1 R2
287
RAExpressionsCanGetComplex!
sname=fred sname=gizmo
P pidP ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
P name
PersonPurchasePersonProduct
288
OperationsonMultisets
• AllRAoperationsneedtobedefinedcarefullyonbags
- sC(R):preservethenumberofoccurrences
- PA(R):noduplicateelimination
- Cross-product,join:noduplicateelimination
Thisisimportant- relationalenginesworkonmultisets,notsets!
289
RAhasLimitations!
• Cannotcompute“transitiveclosure”
• FindalldirectandindirectrelativesofFred• CannotbeexpressedinRA!- NeedtowriteCprogram,useagraphengine,ormodernSQL…
Name1 Name2 RelationshipFred Mary FatherMary Joe CousinMary Bill SpouseNancy Lou Sister
290
Activity-45.ipynb
aspartofHW6
291
L22:QueryOptimization
CS3200 Databasedesign(sp18 s2)https://course.ccs.neu.edu/cs3200sp18s2/4/5/2018
292
Logicalvs.PhysicalOptimization
• Logicaloptimization:- Findequivalentplansthataremoreefficient- Intuition:Minimize#oftuplesateachstepbychangingthe
orderofRAoperators
• Physicaloptimization:- FindalgorithmwithlowestIOcosttoexecuteourplan- Intuition:Calculatebasedonphysicalparameters(buffer
size,etc.)andestimatesofdatasize(histograms)
• WeonlydiscussLogicaloptimizationtoday
Execution
SQLQuery
RelationalAlgebra(RA)Plan
OptimizedRAPlan
293
1.LogicalOptimization
1) OptimizationofRAPlans2) ACTIVITY:RAPlanOptimization
294
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Declarativequery(fromuser)
Translatetorelationalalgebraexpresson
Findlogicallyequivalent- butmoreefficient- RAexpression
Executeeachoperatoroftheoptimizedplan!
295
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizable expressions!
296
RelationalAlgebra(RA)
• Fivebasicoperators:1. Selection:s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:- Intersection,complement- Joins(natural,equi-join,thetajoin,semi-join)- Renaming:r- Division
We’lllookatthesefirst!
Andalsoatoneexampleofaderivedoperator(naturaljoin)andaspecialoperator(renaming)
297
Recall:ConvertingSFWQuery->RA
SELECT DISTINCTgpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDsname = pname;
HowdowerepresentthisqueryinRA?
Π{|},}������(𝜎{|}~�.w(𝑆 ⋈ 𝑃))
Students(sid,sname,gpa)People(ssn,sname,address)
298
Recall:LogicalEquivalenceofRAPlans
• GivenrelationsR(A,B)andS(B,C):
- Here,projection&selectioncommute:• 𝜎�Sw(Π�(𝑅)) = Π�(𝜎�Sw(𝑅))
- Whatabouthere?• 𝜎�Sw(Π�(𝑅))?= Π�(𝜎�Sw(𝑅))
We’lllookatthisinmoredepthlaterinthelecture…
299
RDBMSArchitecture
• HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
We’lllookathowtothenoptimizetheseplansnow
300
Note:Wecanvisualizetheplanasatree
Π�
R(A,B) S(B,C)
Π�(𝑅 𝐴, 𝐵 ⋈ 𝑆 𝐵, 𝐶 )
Bottom-uptreetraversal=orderofoperationexecution!
301
Asimpleplan
WhatSQLquerydoesthiscorrespondto?
ArethereanylogicallyequivalentRAexpressions?
Π�
R(A,B) S(B,C)
302
“Pushingdown”projection
Π�
R(A,B) S(B,C)
Π�
Whymightwepreferthisplan?
Π�
R(A,B) S(B,C)
303
Takeaways
• Thisprocessiscalledlogicaloptimization
• Manyequivalentplansusedtosearchfor“goodplans”
• Relationalalgebraisanimportantabstraction.
304
RAcommutators
• Thebasiccommutators:- Pushprojection through(1)selection,(2)join- Pushselection through(3)selection,(4)projection,(5)join- Also:Joinscanbere-ordered!
• Notethatthisisnotanexhaustivesetofoperations
ThissimplesetoftoolsallowsustogreatlyimprovetheexecutiontimeofqueriesbyoptimizingRAplans!
305
OptimizingtheSFW RAPlan
306
Π�,�
R(A,B) S(B,C)
T(C,D)
sA<10
Π�,�(𝜎��"T 𝑇 ⋈ 𝑅 ⋈ 𝑆 )
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
R(A,B) S(B,C) T(C,D)
TranslatingtoRA
307
LogicalOptimization
• Heuristically,wewantselectionsandprojectionstooccurasearlyaspossibleintheplan- Terminology:“pushdownselections”and“pushingdownprojections.”
• Intuition:Wewillhavefewertuplesinaplan.- Couldfailiftheselectionconditionisveryexpensive(sayrunssomeimageprocessing
algorithm).- Projectioncouldbeawasteofeffort,butmorerarely.
308
Π�,�
R(A,B) S(B,C)
T(C,D)
sA<10
Π�,�(𝜎��"T 𝑇 ⋈ 𝑅 ⋈ 𝑆 )
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
OptimizingRAPlan
PushdownselectiononAsoitoccursearlier
R(A,B) S(B,C) T(C,D)
309
Π�,�
R(A,B)
S(B,C)
T(C,D)
Π�,� 𝑇 ⋈ 𝜎��"T(𝑅) ⋈ 𝑆
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
OptimizingRAPlan
PushdownselectiononAsoitoccursearlier
sA<10
R(A,B) S(B,C) T(C,D)
310
Π�,�
R(A,B)
S(B,C)
T(C,D)
Π�,� 𝑇 ⋈ 𝜎��"T(𝑅) ⋈ 𝑆
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
OptimizingRAPlan
Pushdownprojectionsoitoccursearlier
sA<10
R(A,B) S(B,C) T(C,D)
311
Π�,�
R(A,B)
S(B,C)
T(C,D)
Π�,� 𝑇 ⋈ Π�,� 𝜎��"T(𝑅) ⋈ 𝑆
SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B
AND S.C = T.CAND R.A < 10;
OptimizingRAPlanWeeliminateBearlier!
sA<10
Π�,�
Ingeneral,whenisanattributenotneeded…?
R(A,B) S(B,C) T(C,D)
312
Activity-47.ipynb
aspartofHW6