Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CS133:Databases
Fall2019Lec24–12/05
ParallelandDistributedDBMSs
Prof.BethTrushkowsky
Warm-upExercise
(Seeexercisesheet.Youcanstartbeforeclass.)
Performance,availability,morestoragespacetofitthedata
GoalsforToday• DiscussdistributingdataandworkloadforincreasedperformanceinaDBMS
• ReasonaboutDBMSconceptsinparallelanddistributedsetting– HowtoachievedistributedACIDtransactions– Dataconsistencyacrosscopies
• Understandwhysomeofthedisadvantagesofdistributeddatabaseshaveledtosomerelaxationofconsistency
SomeParallelismTerminology• Throughput
– Amountofworkdoneperunittime
• Latency(responsetime)– Timetocompleteoneunitwork
• Speed-Up
– Moreresourcesmeanslesstimeforagivenunitofworkàdomoreunitsofworkinsametime
• Scale-Up
– Ifresourcesincreasedinproportiontoincreaseinunitsofwork,timeperunitisconstant.
degreeofparallelism
Xact/sec.
(throu
ghpu
t) Ideal
(speed-up)
degreeofparallelism
sec./Xact
(respo
nsetim
e)
Ideal(scale-up)
Resourcecontentionimpactsscale-up/speed-up
ParallelDBMSArchitectures• Multipleprocessors(CPUs)candoworkinparallel– Howdotheycommunicateaboutwhatworktodo?– ThreemainparallelDBarchitectures:
Disk
Disk
Disk
CPU
CPU
CPU
SharedRAM
SharedMemory(akasharedeverything)
Disk
Disk
Disk
CPU
CPU
CPU
SharedDisk
RAM
RAM
RAM
Disk
Disk
CPU
CPU
SharedNothing
RAM
RAM
Partitioning(akashardingakahorizontallyfragmenting)atable:Range Hash RoundRobin
A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z
Goodforequijoins,rangequeriesgroup-by
Goodforequijoins Goodtospreadload
DataPartitioning
DifferentTypesofDBMSParallelism
• Inter-queryparallelism:differentqueriesrunondifferentnodes
• Intra-operator“partitioned”parallelism– Multiplemachinesworkingtogethertoexecuteanoperator(e.g.,scan,sort,join)
– Machinesworkondisjointpartitionofthedata
• Parallelizingarelationaloperator:mergeandsplit– Mergestreamsofoutputtoserveasinputtoanoperator– Splitoutputofoperatortobeprocessedinparallel
Sequential Sequential Sequential
Sequential Sequential
Code Sequential
Code
DifferentTypesofDBMSParallelism
• Inter-operator“pipelined”parallelism– Eachrelationaloperatormayrunconcurrentlyonadifferentmachine
– Outputoffirstoperatorconsumedon-the-flyasinputtosecondoperator
Sequential Code
Sequential Code
Couldbelimitedbyblockingoperatorssuchassortoraggregation
Exercise2
• (a)inter-query• (b)intra-operator“partitioned”parallelism
DistributedDBMS(SharedNothing)
• Dataisstoredatseveralsites(geo-distributed),eachmanagedbyaDBMSthatcanrunindependently
• DistributedDataIndependence:Usersshouldnothavetoknowwheredataislocated– Note:catalogneedstokeeptrackofwheredatais
• DistributedTransactionAtomicity:UsersshouldbeabletowriteXactsaccessingmultiplesitesjustlikelocalXacts
ExtendsPhysicalandLogicalDataIndependenceprinciples
DistributedQueryProcessing:Joins
• Approach1--FetchasNeeded:PageNLJ,Sailorsasouter(querysubmittedatLondon):– Discosttoread/writepage;Siscosttoshippage– Cost:500D+500*1000(D+S)=500,500D+500,000S– IfquerynotsubmittedatLondon,mustaddcostofshippingresulttoquerysite
• Approach2--ShiptoOneSite:ShipReservestoLondon– Cost:1000(D+S)+500D+500*1000D=501,500D+1000S
Sailors ReservesLONDON PARIS
500pages 1000pages
JOINonsid
Couldalsoconsiderothersingle-sitejoinmethods
Semijoin• WhyshipallofReservestoLondon?
– Someofthesetuplesmaynotevenendupbeingjoined,sowastedcommunicationcost
– Idea:onlyshiptheReservestuplesthatwillmatchSailorstuples
• Bottomline:Tradeoffthecostofcomputingandshippingprojection
forcostofshippingfullReservesrelation.
• Note:EspeciallyusefulifthereisaselectiononSailors,andthenjoinselectivityisalsohigh
Sailors ReservesLONDON PARIS
500pages 1000pages
(1)πsidSailors
(2)Reserves�Sailors(3)finishjoin
Anaside:BloomFilter• Abloomfilterisabitvectorusedtoquicklydeterminewhetheranelementbelongstoaset
• Hashelementstooneofkbuckets
Refinement:Bloomjoin• Idea:ratherthanshippingthejoincolumn,shipamorecompactdata
structurethatcaptures(almost)thesameinfo– Bloomfilterbitvector– Bit-vectorcheapertoship,almostaseffective(falsepositivespossible)
• HashSailors.sidvaluesintorange[0,k-1]– Iftuplehashestosloti,setbitito1
• HashReservestuplesintosamerange[0,k-1]– Discardtuplesthathashto0inSailorsbitvector
Sailors ReservesLONDON PARIS
500pages 1000pages
(1)Sailors.sidbitvector
(2)Reservesreduction(3)finishjoin
QueryOptimization• Newconsiderationsforcost-basedapproach
– Communicationcosts– Newdistributedjoinmethods
• Alsooptimizingforqueryresponsetime?– Mightwanttoconsideralargerspaceofplans
Vs.
R S
a=b T
b=c U
c=d
R S
a=b
U
b=c
c=d
T
Maycostless
Mayallowparallelismthatyields
overalllowerresponsetime
Exercise3
• Concerns:correctness,deadlock,performance• Someoptions:– Alllockrequestsgothroughacentrallocation– Lockrequestsdistributed;requestissenttothesitethatholdsthedatathatisdesired
DistributedTransactions• Withdataatdistributedsites,atransactionmayoperateon
dataatmultiplesites– xactbrokendownintosub-xactsthatexecuteateachsite
• Example:readandupdateinventoryatfoursiteswithhorizontalpartitionsofthedata– T:R(A),R(B),R(C),R(D),W(A),W(B),W(C),W(D),commit
• Howdoweguaranteeatomicity??– Needacommitprotocol(typeofconsensusprotocol)– Alogismaintainedateachsite,asinacentralizedDBMS,andcommitprotocolactionsareadditionallylogged
A B C DWhereshouldlocksbemanaged?
Two-PhaseCommit(2PC)• Siteatwhichxactoriginatesiscoordinator;othersitesatwhichitexecutesaresubordinates
• Supposexactwantstocommit(normalexecution):(1)PREPARE!
(2a)Log:noorready
(2a)Log:noorready
(2a)Log:noorready
(2b)YESorNO
(3a)AllYES?Log:commitElseLog:abort
(3b)COMMITorABORT
(5)Log:end
(4a)Log:abortorcommit
(4a)Log:abortorcommit
(4a)Log:abortorcommit
4b)ACK!
2PC:Steps• WhenaXactwantstocommit:① Coordinatorsendspreparemsgtoeachsubordinate.② Subordinateforce-writesanoorreadylogrecordandthensendsanooryesmsgtocoordinator.
③ Ifcoordinatorgetsunanimousyesvotes,force-writesacommitlogrecordandsendscommitmsgtoallsubs.Else,force-writesabortlogrec,andsendsabortmsg.
④ Subordinatesforce-writeabort/commitlogrecbasedonmsgtheyget,thensendackmsgtocoordinator.
⑤ Coordinatorwritesendlogrecaftergettingallacks.
SiteandLinkFailuresL
• Ifcoordinatordetectsasubordinatefailed,e.g.,aftertimeout
– Before“yes”?àassumeitwas“abort”– After“yes”?àcontinueasnormal
• IfcoordinatorforXactTfails,subordinateswhohavevotedyescannotdecidewhethertocommitorabortTuntilcoordinatorrecovers– XactTisblocked
A B
Exercise4:whenthefailedsubordinatesitewakesup,canittellifaglobaltransactioncommittedoraborted?
Exercise4
• Sitethatwakesupshouldinterpretitslog– (a)Tcommitted!itshouldREDOactionsforT– (b)Taborted!itshouldUNDOactionsforT– (c)Can’ttell...Needstoconsultcoordinatoraboutwhatcoordinatordecided
– (d)Taborted!itshouldwriteabort,andUNDO• Sinceitneverevenwroteready,coordinatorcouldn’tpossiblyhavedecidedtocommit(sinceitwouldhavewaitedforthesitetosayyes)
DataReplication• Replication:keepcopiesofdataatdifferentsites
• Benefits– Givesincreasedavailability– Fasterqueryevaluation
• Flavors– Synchronous(eager)vs.Asynchronous(lazy)
• Varyinhowcurrentcopiesare(i.e.,howconsistenttheyare)
• Canbeusedinadditiontodatapartitioning– Fullreplication:copyofalldataateverysite(vs.partial)
Locking:onprimarycopyorfullydistributed
UpdatingDistributedData
• Synchronous(Eager)Replication:setofcopiesofamodifiedrelationmustbeupdatedbeforethemodifyingxactcommits– Exclusivelocksonallthecopiesthataremodified– Users/appsdonotneedtoknowdatalocation(s)
• Asynchronous(Lazy)Replication:Copiesofamodifiedrelationareonlyperiodicallyupdated– Differentcopiesmaygetoutofsyncinthemeantime– Users/applicationsmustbeawareofdatalocation(s)
Notnecessarilyallcopies!
SynchronousReplication:Majority• Majoritytechniquecanguaranteedataconsistency:– Xactmustwriteamajorityofcopiestomodifyanobject– Eachcopyhasversionnumberforobject– Xactmustreadenoughcopiestobesureofseeingatleastonemostrecentcopy
• Example:6copiesofdata
Written:4nodes
Read:3nodes
ImpactonperformanceofREADqueries??
CoulduseRead-one-write-all
(ROWA)policyinstead