Ch12 Overview Query Evaluation

Embed Size (px)

Citation preview

  • 8/6/2019 Ch12 Overview Query Evaluation

    1/7

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1

    OverviewofQueryEvaluation

    Chapter12

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 2

    OverviewofQueryEvaluation

    Plan: TreeofR.A.ops,withchoiceof alg foreachop. Eachoperatortypicallyimplementedusinga`pull

    interface:whenanoperatoris`pulledforthenextoutputtuples,it`pullsonitsinputsandcomputesthem.

    Twomainissuesinqueryoptimization: Foragivenquery,whatplansareconsidered?

    Algorithmtosearchplanspaceforcheapest(estimated)plan.

    Howisthecostofaplanestimated?

    Ideally:Wanttofindbestplan.Practically:Avoidworstplans!

    WewillstudytheSystemRapproach.

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 3

    SomeCommonTechniques

    Algorithmsforevaluatingrelationaloperatorsusesomesimpleideasextensively:

    Indexing: CanuseWHEREcondi tionstoretrievesmallsetof tuples (selections,joins)

    Iteration: Sometimes,fastertoscanall tuples evenifthereisanindex.(Andsometimes,wecanscanthedataentriesinanindexinsteadofthetableitself.)

    Partitioning: Byusingsortingorhashing,wecanpartitiontheinput tuples andreplaceanexpensiveoperationbysimilaroperationsonsmallerinputs.

    *Watchforthesetechniquesaswediscussqueryevaluation!

  • 8/6/2019 Ch12 Overview Query Evaluation

    2/7

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 4

    StatisticsandCatalogs

    Needinformationabouttherelationsandindexesinvolved.Catalogs typicallycontainatleast:

    # tuples (NTuples)and#pages(NPages)foreachrelation. #distinctkeyvalues(NKeys)and NPages foreachindex. Indexheight,low/highkeyvalues(Low/High)foreach

    treeindex.

    Catalogsupdatedperiodically. Updatingwheneverdatachangesistooexpensive;lotsof

    approximationanyway,soslightinconsistencyok.

    Moredetailedinformation(e.g.,histogramsofthevaluesinsomefield)aresometimesstored.

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 5

    AccessPaths

    Anaccesspath isamethodofretrieving tuples: Filescan,orindex thatmatches aselection(inthequery)

    Atreeindexmatches (aconjunctionof)termsthatinvolveonlyattributesinaprefix ofthesearchkey.

    E.g.,Treeindexonmatchestheselection a=5AND b=3,anda=5AND b>6,butnot b=3.

    Ahashindexmatches (aconjunctionof)termsthathasatermattribute=value foreveryattributeinthesearchkeyoftheindex.

    E.g.,Hashindexonmatchesa=5AND b=3ANDc=5;butitdoesnotmatch b=3,or a=5AND b=3,or a>5

    AND b=3AND c=5.

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 6

    ANoteonComplexSelections

    Selectionconditionsarefirstconvertedtoconjunctivenormalform(CNF):

    (day

  • 8/6/2019 Ch12 Overview Query Evaluation

    3/7

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 7

    OneApproachtoSelections

    Findthemostselectiveaccesspath,retrieve tuples usingit,andapplyanyremainingtermsthatdontmatchtheindex:

    Mostselectiveaccesspath:AnindexorfilescanthatweestimatewillrequirethefewestpageI/Os.

    Termsthatmatchthisindexreducethenumberof tuplesretrieved;othertermsareusedtodiscardsomeretrievedtuples,butdonotaffectnumberof tuples/pagesfetched.

    Considerday

  • 8/6/2019 Ch12 Overview Query Evaluation

    4/7

  • 8/6/2019 Ch12 Overview Query Evaluation

    5/7

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 3

    ExampleofSort-MergeJoin

    Cost:MlogM+NlogN+(M+N) Thecostofscanning,M+N,couldbeM*N(veryunlikely!)

    With35,100or300bufferpages,bothReservesandSailorscanbesortedin2passes;totaljoincost:7500.

    sid sname rating age22 dustin 7 45.0

    28 yuppy 9 35.0

    31 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

    sid bid day rname

    28 1 03 12/4 /96 g uppy28 1 03 11/3 /96 y uppy

    31 101 10/10/96 dustin

    31 102 10/12/96 lubber

    31 101 10/11/96 lubber

    58 103 11/12/96 dustin

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 4

    HighlightsofSystemROptimizer

    Impact: Mostwidelyusedcurrently;workswellfor

  • 8/6/2019 Ch12 Overview Query Evaluation

    6/7

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 6

    SizeEstimationandReductionFactors

    Consideraqueryblock:

    Maximum# tuples inresultistheproductofthecardinalitiesofrelationsintheFROM clause.

    Reductionfactor(RF)associatedwitheach term reflectstheimpactoftheterm inreducingresultsize.Resultcardinality =Max# tuples *productofall RFs.

    Implicitassumption thatterms areindependent! Term col=valuehasRF1/NKeys(I),givenindexIon col Termcol1=col2hasRF1/MAX(NKeys(I1), NKeys(I2)) Term col>valuehasRF(High(I)-value)/(High(I)-Low(I))

    SELECT attributelist

    FROM relationlistWHERE term1AND ...AND termk

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 7

    SchemaforExamples

    Similartooldschema; rname addedforvariations.

    Reserves: Each tuple is40byteslong,100 tuples perpage,1000pages.

    Sailors: Each tuple is50byteslong,80 tuples perpage,500pages.

    Sailors(sid:integer, sname:string,rating:integer,age:real)Reserves(sid:integer,bid:integer,day:dates, rname:string)

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 8

    MotivatingExample

    Cost:500+500*1000I/Os

    Bynomeanstheworstplan!Missesseveralopportunities:selectionscouldhavebeen`pushedearlier,nouseismadeofanyavailableindexes,etc.

    Goalofoptimization:Tofindmoreefficientplansthatcomputethesameanswer.

    SELECT S.snameFROM ReservesR,SailorsSWHERE R.sid=S.sid AND

    R.bid=100AND S.rating>5

    Reserves Sailors

    sid=sid

    bid=100 rating>5

    sname

    Reserves Sailors

    sid=sid

    bid=100 rating>5

    sname

    (SimpleNestedLoops)

    (On-the-fly)

    (On-the-fly)

    RATree:

    Plan:

  • 8/6/2019 Ch12 Overview Query Evaluation

    7/7

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 1 9

    AlternativePlans1(NoIndexes)

    Maindifference:pushselects.

    With5buffers,costofplan: ScanReserves(1000)+writetempT1(10pages,ifwehave100boats,

    uniformdistribution). ScanSailors(500)+writetempT2(250pages,ifwehave10ratings). SortT1(2*2*10),sortT2(2*3*250),merge(10+250) Total:3560pageI/Os.

    IfweusedBNLjoin, joincost=10+4*250,totalcost=2770.

    Ifwe`pushprojections,T1hasonly sid,T2only sid and sname: T1fitsin3pages,costofBNLdropstounder250pages,total5(Scan;writetotempT1)

    (Scan;writetotempT2)

    (Sort-MergeJoin)

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 2 0

    AlternativePlans2WithIndexes

    WithclusteredindexonbidofReserves,weget100,000/100=1000 tuples on1000/100=10pages.

    INLwithpipelining(outerisnotmaterialized).

    v Decisionnottopushrating>5beforethejoinisbasedon

    availabilityofsid indexonSailors.

    v Cost:SelectionofReserves tuples (10I/Os);foreach,

    mustgetmatchingSailors tuple (1000*1.2);total1210I/Os.

    v Joincolumn sid isakeyforSailors.Atmostonematching tuple, unclustered indexon sid OK.

    Projectingoutunnecessaryfieldsfromouterdoesnthelp.

    Reserves

    Sailors

    sid=sid

    bid=100

    sname(On-the-fly)

    rating>5

    (Usehashindex;donotwriteresulttotemp)

    (IndexNestedLoops,withpipelining)

    (On-the-fly)

    D at aba se M an ag em en t Sy st em s 3e d, R. R am akr is hn an a nd J. G eh rk e 2 1

    Summary

    Thereareseveralalternativeevaluationalgorithmsforeachrelationaloperator.

    Aqueryisevaluatedbyconvertingittoatreeofoperatorsandevaluatingtheoperatorsinthetree.

    Mustunderstandqueryoptimizationinordertofully

    understandtheperformanceimpactofagivendatabasedesign(relations,indexes)onaworkload(setofqueries).

    Twopartstooptimizingaquery: Considerasetofalternativeplans.

    Mustprunesearchspace;typically,left-deepplansonly. Mustestimatecostofeachplanthatisconsidered.

    Mustestimatesizeofresultandcostforeachplannode.

    Keyissues:Statistics,indexes,operatorimplementations.