CompSci 516DataIntensiveComputingSystems
Lecture4RelationalAlgebra
andRelationalCalculus
Instructor:Sudeepa Roy
1DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems
Announcement
• Deadlinesextendedbyaweekforprojectproposalandmidtermreport
• Proposal(writeup):09/28(Wed)– Butsendmeanemail(s)mentioningyourgroupmembersanddescribingyourprojectby 09/21
– Projectideastobepostedbythisweekend• Midtermreport:10/28(Fri)• Finalreport:11/28(Mon)
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 2
Today’stopics
• FinishSQL(fromLecture2)• RelationalAlgebra(RA)andRelationalCalculus(RC)
• Readingmaterial– [RG]Chapter4(RA,RC)– [GUW]Chapters2.4,5.1,5.2
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3
Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.
RelationalQueryLanguages
• Querylanguages:Allowmanipulationandretrievalofdatafromadatabase
• Relationalmodelsupportssimple,powerfulQLs:– Strongformalfoundationbasedonlogic– Allowsformuchoptimization
• QueryLanguages!= programminglanguages– QLsnotintendedtobeusedforcomplexcalculations– QLssupporteasy,efficientaccesstolargedatasets
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4
FormalRelationalQueryLanguages
• Two“mathematical”QueryLanguagesformthebasisfor“real”languages(e.g.SQL),andforimplementation:– RelationalAlgebra:Moreoperational,veryusefulforrepresentingexecutionplans
– RelationalCalculus:Letsusersdescribewhattheywant,ratherthanhowtocomputeit(Non-operational,declarative)
• Note:Declarative(RC,SQL)vs.Operational(RA)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5
Preliminaries• Aqueryisappliedtorelationinstances,andtheresultofaqueryisalsoarelationinstance.– Schemasofinputrelationsforaqueryarefixed
• querywillrunregardlessofinstance
– Theschemafortheresultofagivenqueryisalsofixed• Determinedbydefinitionofquerylanguageconstructs
• Positionalvs.named-fieldnotation:– Positionalnotationeasierforformaldefinitions,named-fieldnotationmorereadable
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6
ExampleSchemaandInstances
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1
S1 S2
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
LogicNotations
• $ Thereexists• " Forall• ∧ LogicalAND• ∨ LogicalOR• ¬NOT
RelationalAlgebra(RA)
RelationalAlgebra
• Takesoneormorerelationsasinput,andproducesarelationasoutput– operator– operand– semantic– soanalgebra!
• Sinceeachoperationreturnsarelation,operationscanbecomposed– Algebrais“closed”
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 10
RelationalAlgebra• Basicoperations:
– Selection(σ)Selectsasubsetofrowsfromrelation– Projection(π)Deletesunwantedcolumnsfromrelation.– Cross-product(x)Allowsustocombinetworelations.– Set-difference(-)Tuplesinreln.1,butnotinreln.2.– Union(∪)Tuplesinreln.1orinreln.2.
• Additionaloperations:– Intersection(∩)– join⨝– division(/)– renaming(ρ)– Notessential,but(very)useful.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 11
Projection
sname ratingyuppy 9lubber 8guppy 5rusty 10p sname rating S, ( )2
age35.055.5
page S( )2
• Deletesattributesthatarenotinprojectionlist.
• Schema ofresultcontainsexactlythefieldsintheprojectionlist,withthesamenamesthattheyhadinthe(only)inputrelation.
• Projectionoperatorhastoeliminateduplicates(Why)
– Note:realsystemstypicallydon’tdoduplicateeliminationunlesstheuserexplicitlyasksforit(performance)
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
10DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
Selection
s rating S>8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname ratingyuppy 9rusty 10
p ssname rating rating S, ( ( ))>8 2
• Selectsrowsthatsatisfyselectioncondition
• Noduplicatesinresult.Why?
• Schemaofresultidenticaltoschemaof(only)inputrelation
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S2
11DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems
CompositionofOperators
• Resultrelationcanbetheinputforanotherrelationalalgebraoperation– Operatorcomposition s rating S>8 2( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname ratingyuppy 9rusty 10
p ssname rating rating S, ( ( ))>8 2
Union,Intersection,Set-Difference
• Alloftheseoperationstaketwoinputrelations,whichmustbeunion-compatible:– Samenumberoffields.– `Corresponding’fieldshavethesametype
– sameschemaastheinputs
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
S S1 2È
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S1 S2
Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 12
Union,Intersection,Set-Difference
• Note:noduplicate– “Setsemantic”– SQL:UNION– SQLallows“bagsemantic”aswell:UNIONALL
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
S S1 2È
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S1 S2
Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 12
Union,Intersection,Set-Difference
sid sname rating age31 lubber 8 55.558 rusty 10 35.0
S S1 2Ç
sid sname rating age22 dustin 7 45.0
S S1 2-
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
S1 S2
Duke%CS,%Spring%2016 CompSci 516:%Data%Intensive%Computing%Systems 13
Cross-Product• EachrowofS1ispairedwitheachrowofR1.• ResultschemahasonefieldperfieldofS1andR1,withfield
names`inherited’ifpossible.– Conflict:BothS1andR1haveafieldcalledsid.
(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 18
RenamingOperator⍴
(sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 11/12/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 11/12/96
§Ingeneral,canuse⍴(<Temp>,<RA-expression>)
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 19
(⍴sid →sid1S1)⨉ (⍴sid →sid1R1)or
⍴(C(1→sid1,5→sid2),S1⨉ R1)Cisthenewrelationname
Joins
• Resultschemasameasthatofcross-product.• Fewertuplesthancross-product,mightbeableto
computemoreefficiently
R c S c R S!" = ´s ( )
(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96
S RS sid R sid1 11 1!" . .<
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 20
Findnamesofsailorswho’vereservedboat#103
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 21
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
Findnamesofsailorswho’vereservedboat#103
• Solution1:
• Solution2:
p ssname bid serves Sailors(( Re ) )=103 !"
p ssname bid serves Sailors( (Re ))=103 !"
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 22
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
ExpressinganRAexpressionasaTree
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 23
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
p ssname bid serves Sailors(( Re ) )=103 !"Sailors Reserves
σbid=103
⨝sid =sid
πsname
Alsocalledalogicalqueryplan
Findsailorswho’vereservedaredoragreenboat
• Canidentifyallredorgreenboats,thenfindsailorswho’vereservedoneoftheseboats:
r s( , ( ' ' ' ' ))Tempboats color red color green Boats= Ú =
p sname Tempboats serves Sailors( Re )!" !"
Can also define Tempboats using unionTry the “AND” version yourself
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 24
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
Useofrenameoperation
Whataboutaggregates?
• Extendedrelationalalgebra• 𝝲age,avg(rating)→ avgr Sailors• Alsoextendedto“bagsemantic”:allowduplicates
– Takeintoaccountcardinality– RandShavetupletresp.mandntimes– R∪ Shastm+n times– R∩Shastmin(m,n)times– R– Shastmax(0,m-n)times– sorting(τ),duplicateremoval(ẟ)operators
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 25
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
RelationalCalculus(RC)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 26
RelationalCalculus• RAisprocedural
– πA(σA=a R)andσA=a (πAR)areequivalentbutdifferentexpressions
• RC– non-proceduralanddeclarative– describesasetofanswerswithoutbeingexplicitabouthowthey
shouldbecomputed
• TRC(tuplerelationalcalculus)– variablestaketuplesasvalues– wewillprimarilydoTRC
• DRC(domainrelationalcalculus)– variablesrangeoverfieldvalues
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 27
TRC:example
• Findthenameandageofallsailorswitharatingabove7
{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}
• Pisatuplevariable– withexactlytwofieldsnameandage(schemaoftheoutputrelation)– P.name =S.name ⋀ P.age =S.age givesvaluestothefieldsofananswer
tuple
• Useparentheses,∀ ∃ ⋁ ⋀ ><= ≠ ¬etc asnecessary• A⇒ Bisveryusefultoo
– nextslideDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 28
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
$ Thereexists
A⇒ B
• A“implies”B• Equivalently,ifAistrue,Bmustbetrue• Equivalently,¬A⋁ B,i.e.
– eitherAisfalse(thenBcanbeanything)– otherwise(i.e.Aistrue)Bmustbetrue
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 29
UsefulLogicalEquivalences
• "xP(x) =¬$x [¬P(x)]
• ¬(P∨Q) =¬P∧ ¬Q• ¬(P∧ Q) =¬P∨ ¬Q
– Similarly,¬(¬P∨Q)=P∧ ¬Qetc.
• AÞ B=¬A∨ B
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 30
$ Thereexists" Forall∧ LogicalAND∨ LogicalOR¬ NOT
deMorgan’slaws
TRC:example
• Findthenamesofsailorswhohavereservedatleasttwoboats
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 31
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
TRC:example
• Findthenamesofsailorswhohavereservedatleasttwoboats
{P|∃ SϵSailors(∃ R1ϵ Reserves∃ R2ϵReserves⋀ S.sid =R1.sid⋀ S.sid =R2.sid⋀ R1.bid≠R2.bid⋀ P.name =S.name)}
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 32
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
TRC:example
• Findthenamesofsailorswhohavereservedallboats• Divisionoperation
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 33
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
TRC:example
• Findthenamesofsailorswhohavereservedallboats• DivisionoperationinRA!
{P|∃ SϵSailors[∀BϵBoats(∃ Rϵ Reserves(S.sid =R.sid ⋀R.bid =B.bid))]⋀ P.name =S.name}
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 34
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
TRC:example
• Findthenamesofsailorswhohavereservedallred boats
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 35
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
HowwillyouchangethepreviousTRCexpression?
TRC:example
• Findthenamesofsailorswhohavereservedallred boats{P|∃ SϵSailors∀BϵBoats(B.color =‘red’⇒ (∃ Rϵ Reserves(S.sid =R.sid ⋀ R.bid =B.bid ⋀ P.name =S.name)))}
RecallthatA⇒ Bislogicallyequivalentto¬A⋁ Bso⇒ canbeavoided,butitiscleanerandmoreintuitive
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 36
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
DRC:example
• Findthenameandageofallsailorswitharatingabove7
TRC:{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}
DRC:{<N,A>|∃ <I,N,T,A>ϵSailors⋀ T>7}
• Variablesarenowdomainvariables• WewilluseuseTRC
– bothareequivalent
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 37
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)
MoreExamples:RC
• Thefamous“Drinker-Beer-Bar”example!
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 38
Acknowledgement:examplesandslidesbyProfs.BalazinskaandSuciu,andthe[GUW]book
UNDERSTANDTHEDIFFERENCEINANSWERSFORALLFOURDRINKERS
DrinkerCategory1
Find drinkers that frequent some bar that serves some beer they like.
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
39CompSci516:DataIntensiveComputingSystems
DukeCS,Spring2016
DrinkerCategory1
Find drinkers that frequent some bar that serves some beer they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
a shortcut for{x | $Y ϵ Frequents Z ϵ Serves W ϵ Likes (T.drinker = x.drinker∧ T.bar = Z.bar∧ W.beer = ……}
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
40CompSci516:DataIntensiveComputingSystems
DukeCS,Spring2016
Thedifferenceisthatinthefirstone,onevariable=oneattributeinthesecondone,onevariable=onetuple(TupleRC)Bothareequivalentandfeelfreetousetheonethatisconvenienttoyou
DrinkerCategory2
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
41CompSci516:DataIntensiveComputingSystems
Q(x) = …
DukeCS,Spring2016
DrinkerCategory2
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
42CompSci516:DataIntensiveComputingSystems
DukeCS,Spring2016
DrinkerCategory3
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Find drinkers that frequent some bar that serves only beers they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
43CompSci516:DataIntensiveComputingSystems
Q(x) = …
DukeCS,Spring2016
DrinkerCategory3
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Find drinkers that frequent some bar that serves only beers they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))
Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
44CompSci516:DataIntensiveComputingSystems
DukeCS,Spring2016
DrinkerCategory4
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Find drinkers that frequent only bars that serves only beer they like.
Find drinkers that frequent some bar that serves only beers they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))
Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
45CompSci516:DataIntensiveComputingSystems
Q(x) = …
DukeCS,Spring2016
DrinkerCategory4
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Find drinkers that frequent only bars that serves only beer they like.
Find drinkers that frequent some bar that serves only beers they like.
Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
Q(x) = "y. Frequents(x, y)Þ ($z. Serves(y,z)∧Likes(x,z))
Q(x) = "y. Frequents(x, y)Þ "z.(Serves(y,z) Þ Likes(x,z))
Q(x) = $y. Frequents(x, y)∧"z.(Serves(y,z) Þ Likes(x,z))
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
46DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems
WhyshouldwecareaboutRC• RCisdeclarative,likeSQL,andunlikeRA(whichisoperational)
• Givesfoundationofdatabasequeriesinfirst-orderlogic– youcannotexpressallaggregatesinRC,e.g.cardinalityofarelationorsum(possibleinextendedRAandSQL)
– stillcanexpressconditionslike“atleasttwotuples”(oranyconstant)
• RCexpressionmaybemuchsimplerthanSQLqueries– andeasiertocheckforcorrectnessthanSQL– powertouse" and Þ– then you can systematically go to a “correct” SQL
query
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems 47
FromRCtoSQL
Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))
Query: Find drinkers that like some beer (so much) that they frequent all bars that serve it
CompSci516:DataIntensiveComputingSystems
48
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
DukeCS,Spring2016
FromRCtoSQL
Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))
Query: Find drinkers that like some beer so much that they frequent all bars that serve it
Step 1: Replace " with $ using de Morgan’s Laws
Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))
CompSci516:DataIntensiveComputingSystems
49
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
"x P(x) same as¬$x ¬P(x)
¬(¬P∨Q) same asP∧ ¬ Q
º Q(x) = $y. Likes(x, y)∧"z.(¬ Serves(z,y) ∨ Frequents(x,z))
DukeCS,Spring2016
FromRCtoSQL
SELECT DISTINCT L.drinkerFROM Likes LWHERE not exists
(SELECT S.barFROM Serves SWHERE L.beer=S.beer
AND not exists (SELECT * FROM Frequents FWHERE F.drinker=L.drinker
AND F.bar=S.bar))CompSci516:DataIntensiveComputing
Systems50
Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)
DukeCS,Spring2016
Q(x) = $y. Likes(x, y) ∧¬ $z.(Serves(z,y)∧¬Frequents(x,z))
Step 2: Translate into SQL
The“correct”intermediatesteps• WritethequeryinRC• Ifyouhaveavariableunder“negation”,alsoaddthe
“domain”,i.e.wherethevariableappearswithoutanegation– e.g.ifyouhave¬H(x,y)forasubquery,– where xandycanonlycomefromarelationR(x,y)– makeitR(x,y)∧ ¬H(x,y)
– Thisistomakethequery“safe”and“domainindependent”–wewilldiscussthiswhenwedoDatalog
– Intuitively,ifyouaretryingtofind“sailorsthatdonotsatisfysomecriteria”youhavetospecifythedomainofsailors,sayfromthesailortable,otherwiseyouarelookingataninfinitespace
CompSci516:DataIntensiveComputingSystems 51DukeCS,Spring2016
thisslidecanbeskippeduntilwedoDatalogwillrevisit”Safequeries”then
The“correct”intermediatestep
DukeCS,Spring2016 CompSci516:DataIntensiveComputingSystems
52
Make all subqueries with negation domain independenti.e. say where x is coming from
Q(x) = $y. Likes(x, y) ∧ ¬$z.(Likes(x,y)∧Serves(z,y)∧¬Frequents(x,z))
SELECT DISTINCT L.drinker FROM Likes LWHERE not exists
(SELECT * FROM Likes L2, Serves SWHERE L2.drinker=L.drinker and L2.beer=L.beer
and L2.beer=S.beerand not exists (SELECT * FROM Frequents F
WHERE F.drinker=L2.drinkerand F.bar=S.bar))
which can be simplifiedto the previous query
thisslidecanbeskippeduntilwedoDatalogwillrevisit”Safequeries”then
Summary
• YoulearntthreequerylanguagesfortheRelationalDBmodel– SQL– RA– RC
• Allhavetheirownpurposes
• Youshouldbeabletowriteaqueryinallthreelanguagesandconvertfromonetoanother– However,youhavetobecareful,notall“valid”expressionsinonemaybe
expressedinanother– {S|¬(SϵSailors)}– infinitelymanytuples– an“unsafe”query– Morewhenwedo“Datalog”,alsoseeCh.4.4in[RG]
• Nexttopic:DBMSinternals– storage/indexing,queryexecution,algorithms,optimization
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 53
AdditionalSlidesDivisioninRA
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 54
Division• Notsupportedasaprimitiveoperator,butusefulforexpressing
querieslike:
Findsailorswhohavereservedall boats.
• LetAhave2fields,xandy;Bhaveonlyfieldy:
– A/B= {<x>:∀<y>ϵB∃ <x,y>ϵA}
– i.e.,A/Bcontainsallxtuples(sailors)suchthatforevery ytuple(boat)inB,thereisanxy tupleinA.
– Equivalently,– Ifthesetofyvalues(boats)associatedwithanxvalue(sailor)inAcontains
allyvaluesinB,thexvalueisinA/B.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 55
ExamplesofDivisionA/B
sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4
pnop2
pnop2p4
pnop1p2p4
snos1s2s3s4
snos1s4
snos1
A
B1B2
B3
A/B1 A/B2 A/B3DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 56
ExpressingA/BUsingBasicOperators
• Divisionisnotessentialop;justausefulshorthand.– Alsotrueofjoins=selectiononcrossproducts– butjoinsaresocommonthatsystemsimplementjoinsspecially
• Idea:ForA/B,computeallxvaluesthatare“notdisqualified”bysomeyvalueinB
– xvalueisdisqualifiedifbyattachingyvaluefromB,weobtainanxy tuplethatisnotinA
• ForA(x,y)andB(y):– seeexamplefirstonthenextslide
A/B: p px x A B A(( ( ) ) )´ -p x A( ) -
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 57
Example:A/Billustration• Whatis
– Firstcrossproduct:• {s1,s2,s3,s4}⨉ {p2,p4}
– ThenmissingcombinationsinA:• {<s2,p4>,<s3,p4>}
– Then“disqualified”Atuples• {s2,s3}
• Whatis
– The“qualified”Atuples• {s1,s2,s3,s4}– {s2,s3}={s1,s4}
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 58
sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4
pnop2p4
A
B2
p px x A B A(( ( ) ) )´ -
p x A( ) - p px x A B A(( ( ) ) )´ -snos1s4
A/B2
Findthenamesofsailorswho’vereservedallboats
• Usesdivision;schemasoftheinputrelationsto/mustbecarefullychosen:
r p p( , ( , Re ) / ( ))Tempsids sid bid serves bid Boats
p sname Tempsids Sailors( )!"
• To find sailors who’ve reserved all ‘Interlake’ boats:
/ ( ' ' )p sbid bname Interlake Boats=.....
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 59
Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)