Upload
sammy21791
View
223
Download
0
Embed Size (px)
Citation preview
8/6/2019 Ch12 Overview Query Evaluation
1/7
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1
OverviewofQueryEvaluation
Chapter12
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 2
OverviewofQueryEvaluation
Plan: TreeofR.A.ops,withchoiceof alg foreachop. Eachoperatortypicallyimplementedusinga`pull
interface:whenanoperatoris`pulledforthenextoutputtuples,it`pullsonitsinputsandcomputesthem.
Twomainissuesinqueryoptimization: Foragivenquery,whatplansareconsidered?
Algorithmtosearchplanspaceforcheapest(estimated)plan.
Howisthecostofaplanestimated?
Ideally:Wanttofindbestplan.Practically:Avoidworstplans!
WewillstudytheSystemRapproach.
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 3
SomeCommonTechniques
Algorithmsforevaluatingrelationaloperatorsusesomesimpleideasextensively:
Indexing: CanuseWHEREcondi tionstoretrievesmallsetof tuples (selections,joins)
Iteration: Sometimes,fastertoscanall tuples evenifthereisanindex.(Andsometimes,wecanscanthedataentriesinanindexinsteadofthetableitself.)
Partitioning: Byusingsortingorhashing,wecanpartitiontheinput tuples andreplaceanexpensiveoperationbysimilaroperationsonsmallerinputs.
*Watchforthesetechniquesaswediscussqueryevaluation!
8/6/2019 Ch12 Overview Query Evaluation
2/7
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 4
StatisticsandCatalogs
Needinformationabouttherelationsandindexesinvolved.Catalogs typicallycontainatleast:
# tuples (NTuples)and#pages(NPages)foreachrelation. #distinctkeyvalues(NKeys)and NPages foreachindex. Indexheight,low/highkeyvalues(Low/High)foreach
treeindex.
Catalogsupdatedperiodically. Updatingwheneverdatachangesistooexpensive;lotsof
approximationanyway,soslightinconsistencyok.
Moredetailedinformation(e.g.,histogramsofthevaluesinsomefield)aresometimesstored.
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 5
AccessPaths
Anaccesspath isamethodofretrieving tuples: Filescan,orindex thatmatches aselection(inthequery)
Atreeindexmatches (aconjunctionof)termsthatinvolveonlyattributesinaprefix ofthesearchkey.
E.g.,Treeindexonmatchestheselection a=5AND b=3,anda=5AND b>6,butnot b=3.
Ahashindexmatches (aconjunctionof)termsthathasatermattribute=value foreveryattributeinthesearchkeyoftheindex.
E.g.,Hashindexonmatchesa=5AND b=3ANDc=5;butitdoesnotmatch b=3,or a=5AND b=3,or a>5
AND b=3AND c=5.
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 6
ANoteonComplexSelections
Selectionconditionsarefirstconvertedtoconjunctivenormalform(CNF):
(day
8/6/2019 Ch12 Overview Query Evaluation
3/7
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 7
OneApproachtoSelections
Findthemostselectiveaccesspath,retrieve tuples usingit,andapplyanyremainingtermsthatdontmatchtheindex:
Mostselectiveaccesspath:AnindexorfilescanthatweestimatewillrequirethefewestpageI/Os.
Termsthatmatchthisindexreducethenumberof tuplesretrieved;othertermsareusedtodiscardsomeretrievedtuples,butdonotaffectnumberof tuples/pagesfetched.
Considerday
8/6/2019 Ch12 Overview Query Evaluation
4/7
8/6/2019 Ch12 Overview Query Evaluation
5/7
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 3
ExampleofSort-MergeJoin
Cost:MlogM+NlogN+(M+N) Thecostofscanning,M+N,couldbeM*N(veryunlikely!)
With35,100or300bufferpages,bothReservesandSailorscanbesortedin2passes;totaljoincost:7500.
sid sname rating age22 dustin 7 45.0
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sid bid day rname
28 1 03 12/4 /96 g uppy28 1 03 11/3 /96 y uppy
31 101 10/10/96 dustin
31 102 10/12/96 lubber
31 101 10/11/96 lubber
58 103 11/12/96 dustin
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 4
HighlightsofSystemROptimizer
Impact: Mostwidelyusedcurrently;workswellfor
8/6/2019 Ch12 Overview Query Evaluation
6/7
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 6
SizeEstimationandReductionFactors
Consideraqueryblock:
Maximum# tuples inresultistheproductofthecardinalitiesofrelationsintheFROM clause.
Reductionfactor(RF)associatedwitheach term reflectstheimpactoftheterm inreducingresultsize.Resultcardinality =Max# tuples *productofall RFs.
Implicitassumption thatterms areindependent! Term col=valuehasRF1/NKeys(I),givenindexIon col Termcol1=col2hasRF1/MAX(NKeys(I1), NKeys(I2)) Term col>valuehasRF(High(I)-value)/(High(I)-Low(I))
SELECT attributelist
FROM relationlistWHERE term1AND ...AND termk
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 7
SchemaforExamples
Similartooldschema; rname addedforvariations.
Reserves: Each tuple is40byteslong,100 tuples perpage,1000pages.
Sailors: Each tuple is50byteslong,80 tuples perpage,500pages.
Sailors(sid:integer, sname:string,rating:integer,age:real)Reserves(sid:integer,bid:integer,day:dates, rname:string)
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 8
MotivatingExample
Cost:500+500*1000I/Os
Bynomeanstheworstplan!Missesseveralopportunities:selectionscouldhavebeen`pushedearlier,nouseismadeofanyavailableindexes,etc.
Goalofoptimization:Tofindmoreefficientplansthatcomputethesameanswer.
SELECT S.snameFROM ReservesR,SailorsSWHERE R.sid=S.sid AND
R.bid=100AND S.rating>5
Reserves Sailors
sid=sid
bid=100 rating>5
sname
Reserves Sailors
sid=sid
bid=100 rating>5
sname
(SimpleNestedLoops)
(On-the-fly)
(On-the-fly)
RATree:
Plan:
8/6/2019 Ch12 Overview Query Evaluation
7/7
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 9
AlternativePlans1(NoIndexes)
Maindifference:pushselects.
With5buffers,costofplan: ScanReserves(1000)+writetempT1(10pages,ifwehave100boats,
uniformdistribution). ScanSailors(500)+writetempT2(250pages,ifwehave10ratings). SortT1(2*2*10),sortT2(2*3*250),merge(10+250) Total:3560pageI/Os.
IfweusedBNLjoin, joincost=10+4*250,totalcost=2770.
Ifwe`pushprojections,T1hasonly sid,T2only sid and sname: T1fitsin3pages,costofBNLdropstounder250pages,total5(Scan;writetotempT1)
(Scan;writetotempT2)
(Sort-MergeJoin)
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 2 0
AlternativePlans2WithIndexes
WithclusteredindexonbidofReserves,weget100,000/100=1000 tuples on1000/100=10pages.
INLwithpipelining(outerisnotmaterialized).
v Decisionnottopushrating>5beforethejoinisbasedon
availabilityofsid indexonSailors.
v Cost:SelectionofReserves tuples (10I/Os);foreach,
mustgetmatchingSailors tuple (1000*1.2);total1210I/Os.
v Joincolumn sid isakeyforSailors.Atmostonematching tuple, unclustered indexon sid OK.
Projectingoutunnecessaryfieldsfromouterdoesnthelp.
Reserves
Sailors
sid=sid
bid=100
sname(On-the-fly)
rating>5
(Usehashindex;donotwriteresulttotemp)
(IndexNestedLoops,withpipelining)
(On-the-fly)
D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 2 1
Summary
Thereareseveralalternativeevaluationalgorithmsforeachrelationaloperator.
Aqueryisevaluatedbyconvertingittoatreeofoperatorsandevaluatingtheoperatorsinthetree.
Mustunderstandqueryoptimizationinordertofully
understandtheperformanceimpactofagivendatabasedesign(relations,indexes)onaworkload(setofqueries).
Twopartstooptimizingaquery: Considerasetofalternativeplans.
Mustprunesearchspace;typically,left-deepplansonly. Mustestimatecostofeachplanthatisconsidered.
Mustestimatesizeofresultandcostforeachplannode.
Keyissues:Statistics,indexes,operatorimplementations.