CS 133: Databasesbeth/courses/cs133/current/... · 2019. 12. 12. · CS 133: Databases Fall 2019 Lec 24 – 12/05 Parallel and Distributed DBMSs Prof. Beth Trushkowsky Warm-up Exercise

CS133:Databases

Fall2019Lec24–12/05

ParallelandDistributedDBMSs

Prof.BethTrushkowsky

Warm-upExercise

(Seeexercisesheet.Youcanstartbeforeclass.)

Performance,availability,morestoragespacetofitthedata

GoalsforToday•  DiscussdistributingdataandworkloadforincreasedperformanceinaDBMS

•  ReasonaboutDBMSconceptsinparallelanddistributedsetting–  HowtoachievedistributedACIDtransactions–  Dataconsistencyacrosscopies

•  Understandwhysomeofthedisadvantagesofdistributeddatabaseshaveledtosomerelaxationofconsistency

SomeParallelismTerminology•  Throughput

–  Amountofworkdoneperunittime

•  Latency(responsetime)–  Timetocompleteoneunitwork

•  Speed-Up

–  Moreresourcesmeanslesstimeforagivenunitofworkàdomoreunitsofworkinsametime

•  Scale-Up

–  Ifresourcesincreasedinproportiontoincreaseinunitsofwork,timeperunitisconstant.

degreeofparallelism

Xact/sec.

(throu

ghpu

t) Ideal

(speed-up)

degreeofparallelism

sec./Xact

(respo

nsetim

e)

Ideal(scale-up)

Resourcecontentionimpactsscale-up/speed-up

ParallelDBMSArchitectures•  Multipleprocessors(CPUs)candoworkinparallel–  Howdotheycommunicateaboutwhatworktodo?–  ThreemainparallelDBarchitectures:

Disk

Disk

Disk

CPU

CPU

CPU

SharedRAM

SharedMemory(akasharedeverything)

Disk

Disk

Disk

CPU

CPU

CPU

SharedDisk

RAM

RAM

RAM

Disk

Disk

CPU

CPU

SharedNothing

RAM

RAM

Partitioning(akashardingakahorizontallyfragmenting)atable:Range Hash RoundRobin

A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z

Goodforequijoins,rangequeriesgroup-by

Goodforequijoins Goodtospreadload

DataPartitioning

DifferentTypesofDBMSParallelism

•  Inter-queryparallelism:differentqueriesrunondifferentnodes

•  Intra-operator“partitioned”parallelism–  Multiplemachinesworkingtogethertoexecuteanoperator(e.g.,scan,sort,join)

–  Machinesworkondisjointpartitionofthedata

•  Parallelizingarelationaloperator:mergeandsplit–  Mergestreamsofoutputtoserveasinputtoanoperator–  Splitoutputofoperatortobeprocessedinparallel

Sequential Sequential Sequential

Sequential Sequential

Code Sequential

Code

DifferentTypesofDBMSParallelism

•  Inter-operator“pipelined”parallelism– Eachrelationaloperatormayrunconcurrentlyonadifferentmachine

– Outputoffirstoperatorconsumedon-the-flyasinputtosecondoperator

Sequential Code

Sequential Code

Couldbelimitedbyblockingoperatorssuchassortoraggregation

Exercise2

•  (a)inter-query•  (b)intra-operator“partitioned”parallelism

DistributedDBMS(SharedNothing)

•  Dataisstoredatseveralsites(geo-distributed),eachmanagedbyaDBMSthatcanrunindependently

•  DistributedDataIndependence:Usersshouldnothavetoknowwheredataislocated–  Note:catalogneedstokeeptrackofwheredatais

•  DistributedTransactionAtomicity:UsersshouldbeabletowriteXactsaccessingmultiplesitesjustlikelocalXacts

ExtendsPhysicalandLogicalDataIndependenceprinciples

DistributedQueryProcessing:Joins

•  Approach1--FetchasNeeded:PageNLJ,Sailorsasouter(querysubmittedatLondon):–  Discosttoread/writepage;Siscosttoshippage–  Cost:500D+500*1000(D+S)=500,500D+500,000S–  IfquerynotsubmittedatLondon,mustaddcostofshippingresulttoquerysite

•  Approach2--ShiptoOneSite:ShipReservestoLondon–  Cost:1000(D+S)+500D+500*1000D=501,500D+1000S

Sailors ReservesLONDON PARIS

500pages 1000pages

JOINonsid

Couldalsoconsiderothersingle-sitejoinmethods

Semijoin•  WhyshipallofReservestoLondon?

–  Someofthesetuplesmaynotevenendupbeingjoined,sowastedcommunicationcost

–  Idea:onlyshiptheReservestuplesthatwillmatchSailorstuples

•  Bottomline:Tradeoffthecostofcomputingandshippingprojection

forcostofshippingfullReservesrelation.

•  Note:EspeciallyusefulifthereisaselectiononSailors,andthenjoinselectivityisalsohigh


500pages 1000pages

(1)πsidSailors

(2)Reserves�Sailors(3)finishjoin

Anaside:BloomFilter•  Abloomfilterisabitvectorusedtoquicklydeterminewhetheranelementbelongstoaset

•  Hashelementstooneofkbuckets

Refinement:Bloomjoin•  Idea:ratherthanshippingthejoincolumn,shipamorecompactdata

structurethatcaptures(almost)thesameinfo–  Bloomfilterbitvector–  Bit-vectorcheapertoship,almostaseffective(falsepositivespossible)

•  HashSailors.sidvaluesintorange[0,k-1]–  Iftuplehashestosloti,setbitito1

•  HashReservestuplesintosamerange[0,k-1]–  Discardtuplesthathashto0inSailorsbitvector


500pages 1000pages

(1)Sailors.sidbitvector

(2)Reservesreduction(3)finishjoin

QueryOptimization•  Newconsiderationsforcost-basedapproach

–  Communicationcosts–  Newdistributedjoinmethods

•  Alsooptimizingforqueryresponsetime?–  Mightwanttoconsideralargerspaceofplans

Vs.

R S

a=b T

b=c U

c=d

R S

a=b

U

b=c

c=d

T

Maycostless

Mayallowparallelismthatyields

overalllowerresponsetime

Exercise3

•  Concerns:correctness,deadlock,performance•  Someoptions:– Alllockrequestsgothroughacentrallocation– Lockrequestsdistributed;requestissenttothesitethatholdsthedatathatisdesired

DistributedTransactions•  Withdataatdistributedsites,atransactionmayoperateon

dataatmultiplesites–  xactbrokendownintosub-xactsthatexecuteateachsite

•  Example:readandupdateinventoryatfoursiteswithhorizontalpartitionsofthedata–  T:R(A),R(B),R(C),R(D),W(A),W(B),W(C),W(D),commit

•  Howdoweguaranteeatomicity??–  Needacommitprotocol(typeofconsensusprotocol)–  Alogismaintainedateachsite,asinacentralizedDBMS,andcommitprotocolactionsareadditionallylogged

A B C DWhereshouldlocksbemanaged?

Two-PhaseCommit(2PC)•  Siteatwhichxactoriginatesiscoordinator;othersitesatwhichitexecutesaresubordinates

•  Supposexactwantstocommit(normalexecution):(1)PREPARE!

(2a)Log:noorready

(2a)Log:noorready

(2a)Log:noorready

(2b)YESorNO

(3a)AllYES?Log:commitElseLog:abort

(3b)COMMITorABORT

(5)Log:end

(4a)Log:abortorcommit



4b)ACK!

2PC:Steps•  WhenaXactwantstocommit:① Coordinatorsendspreparemsgtoeachsubordinate.② Subordinateforce-writesanoorreadylogrecordandthensendsanooryesmsgtocoordinator.

③ Ifcoordinatorgetsunanimousyesvotes,force-writesacommitlogrecordandsendscommitmsgtoallsubs.Else,force-writesabortlogrec,andsendsabortmsg.

④ Subordinatesforce-writeabort/commitlogrecbasedonmsgtheyget,thensendackmsgtocoordinator.

⑤ Coordinatorwritesendlogrecaftergettingallacks.

SiteandLinkFailuresL

•  Ifcoordinatordetectsasubordinatefailed,e.g.,aftertimeout

–  Before“yes”?àassumeitwas“abort”–  After“yes”?àcontinueasnormal

•  IfcoordinatorforXactTfails,subordinateswhohavevotedyescannotdecidewhethertocommitorabortTuntilcoordinatorrecovers–  XactTisblocked

A B

Exercise4:whenthefailedsubordinatesitewakesup,canittellifaglobaltransactioncommittedoraborted?

Exercise4

•  Sitethatwakesupshouldinterpretitslog–  (a)Tcommitted!itshouldREDOactionsforT–  (b)Taborted!itshouldUNDOactionsforT–  (c)Can’ttell...Needstoconsultcoordinatoraboutwhatcoordinatordecided

–  (d)Taborted!itshouldwriteabort,andUNDO•  Sinceitneverevenwroteready,coordinatorcouldn’tpossiblyhavedecidedtocommit(sinceitwouldhavewaitedforthesitetosayyes)

DataReplication•  Replication:keepcopiesofdataatdifferentsites

•  Benefits–  Givesincreasedavailability–  Fasterqueryevaluation

•  Flavors–  Synchronous(eager)vs.Asynchronous(lazy)

•  Varyinhowcurrentcopiesare(i.e.,howconsistenttheyare)

•  Canbeusedinadditiontodatapartitioning–  Fullreplication:copyofalldataateverysite(vs.partial)

Locking:onprimarycopyorfullydistributed

UpdatingDistributedData

•  Synchronous(Eager)Replication:setofcopiesofamodifiedrelationmustbeupdatedbeforethemodifyingxactcommits–  Exclusivelocksonallthecopiesthataremodified–  Users/appsdonotneedtoknowdatalocation(s)

•  Asynchronous(Lazy)Replication:Copiesofamodifiedrelationareonlyperiodicallyupdated–  Differentcopiesmaygetoutofsyncinthemeantime–  Users/applicationsmustbeawareofdatalocation(s)

Notnecessarilyallcopies!

SynchronousReplication:Majority•  Majoritytechniquecanguaranteedataconsistency:–  Xactmustwriteamajorityofcopiestomodifyanobject–  Eachcopyhasversionnumberforobject–  Xactmustreadenoughcopiestobesureofseeingatleastonemostrecentcopy

•  Example:6copiesofdata

Written:4nodes

Read:3nodes

ImpactonperformanceofREADqueries??

CoulduseRead-one-write-all

(ROWA)policyinstead

Documents

CS 133: Databasesbeth/courses/cs133/current/... · 2019. 12. 12. · CS 133: Databases Fall 2019 Lec 24 – 12/05 Parallel and Distributed DBMSs Prof. Beth Trushkowsky Warm-up Exercise