100
1 L24: Course Evaluation & Review CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 4/12/2018

L24: Course Evaluation & Review · Four 4 4 Quatre Five 5 5 Cinq Six 6 6 Siz NULL NULL 7 Sept NULL NULL 8 Huit Illustration fid fText 1 Un 3 Trois 4 Quatre 5 Cinq 6 Siz 7 Sept 8 Huit

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

1

L24:CourseEvaluation&Review

CS3200 Databasedesign(sp18 s2)https://course.ccs.neu.edu/cs3200sp18s2/4/12/2018

2

Theworldisincreasinglydrivenbydata…

Thisclassteachesthebasicsofhowtouse&managedata.

3

Lectures:1sthalf- fromauser’sperspective

1. SQL:Relationaldatamodels&Queries- ~5lectures ->6lectures- HowtomanipulatedatawithSQL,adeclarativelanguage

• reducedexpressivepowerbutthesystemcandomoreforyou

2. DatabaseDesign:Designtheoryandconstraints- ~6lectures ->7lectures- Designingrelationalschematokeepyourdatafromgettingcorrupted

3. Transactions:Syntax&supportingsystems- ~3lectures ->2lectures- Aprogrammer’sabstractionfordataconsistency

4

Lectures:2ndhalf- understandinghowitworks

4. Databaseinternals:QueryProcessing- ~7lectures ->5.5lectures- Indexing- ExternalMemoryAlgorithms(IOmodel)forsorting,joins,etc.- Basicsofqueryoptimization(CostEstimates)- Relationalalgebra

5. NoSQL- ~0-2lectures ->1.5lectures- Key-ValueStores- (MoreinCS6240:Large-ScaleParallelDataProcessing)

5

Studyingmaterial:"Underwhichstudyconditiondoyouthinkyoulearnbetter?"

Source:Karpicke&Blunt,"RetrievalPracticeProducesMoreLearningthanElaborativeStudyingwithConceptMapping,"Science,2011.

Judgedperformance(=whatpeoplethink)

Actualperformance(=whatisactuallyworking)

passivereading activeQ&A

6

SequencingMaterial:"Underwhichteachingconditiondoyouthinkyoulearnbetter?"

Source:Bjork&Bjork,"Makingthingshardonyourself,butinagoodway:Creatingdesirabledifficultiestoenhancelearning," Psychologyandtherealworld(...),2011.

7

Mypedagogicgoalsforclassroomeffectiveness

Increasedlearning Fairassessment

signalnoise ratio

Δlearningtimeinvested ratio

Goal

Metric

Implications minimize chores,have group HWs,"soft"graded HWs,no attendancecheck,in-class problems,classcontributions,interleaved,discussstudent solutions,...

exam:hard,comprehensive,individual,time-constrained

Risks "Slackingoff" Stress,"notfun"

8

0%#

100%#To

tal&Points&

Student&Popula0on&

MyoriginalGradingPhilosophy(andI'llgobacktoitnextyear)• nofixedpercentages(e.g.,30%forA)• nofixedcut-offs(e.g.,80/100forA)

A B

cut-offpointsdependonoverallclassinteractivityascomparedtootheryears

Iwillnotdisclosetheactualcut-offpoints.Don'taskforanexception.

Actual point distribution from apast finalexam:long,butfair!

C

9

Ideasfornextyear

• Noproject,butmorelongerhomeworks inlargergroups- keepsoftgradedHWs,hardexams

• Gradescope,someautograded Jupyter notebookchecksforcode• Topics:- 1SQL:nochange- 2Databasedesign:shortenandcompletelyreplacetheStanfordarrownotation

withcrowfoot/UML;personalizeGradiance- 3Transactions:extendandincludehands-onexercises- 4Databaseinternals:shorten;removeadvancedjoins,butkeepindicesandRA- 5NoSQL:extendwithhands-onwithall4typesofNoSQLdatabases;

• Tobedecided:allinJupyter,oractualinstallations,orpreinstalledonVMs

10

Reminder:FacultyCourseEvaluations

• Pleasetakethenext10minutestocompleteyourFacultyCourseEvaluation("TRACE")forthiscourse.

• Yourfeedback:- Helpsmeimprovethecourse- Helpsyourfellowstudentsmakebetterdecisionsaboutcoursesandprofessors- Isanonymous– Igetareportwithresultsandcomments2-3weeksaftergrades

arein- Shouldonlytake10minutestocomplete- Writtencommentsareespeciallyvaluable

11

Thanksforleaving*detailed*FeedbackJ1. Topics mostinterestingtoyou(andwhy):1SQL,2DBdesign3Transactions,4Internals,5NoSQL:more

orlessmaterial?/whatpartmostdifficult/slowerorfaster?2. Classorganization/Website/:didyoufindwhatyouwerelookingfor?/whatwasdifficulttofindor

follow?Whatwouldhavehelped?SuggestionsforBlackboardorwebsiteorPiazzaorGradescope?3. Installingsoftware:whatwasdifficulttoinstall?Whatwouldhavehelpedinadditiontotheprovided

PDFs(installationvideos?)ReplacePostgresql withMySQL(Postgresql isbetterforoptimizing,butwewon'tcoverthisindetailnexttime)?Virtualmachines?

4. Jupyter notebooks:whatwentwellorwrong?howtoimprove?5. Whataspectshelpedyoulearnandnotforget:morecoldcalling/groupexercises/shortslideexercises

(SQL)/hands-onSQLtyping/FMs onhomeworksolutions/OfficeHours/TAOfficeHours6. Learningmaterial:SQLonslidesvs.SQLtyping/slides/textbooks/otherresources7. Useofcomputers&socialmediainclass:yes/no8. HW &groupprojects:workingingroups/assignedrandomgroups/detailoffeedbackinHW solutions

/peerevaluation/slackingoffvs.learning9. Assessment&cheating:moreBritishstyle:homeworks =practice/final=test10.Bestpracticefromotherclasses/whattocopy*to*otherclasses.OtherwaysIcanhelp:officehours/

anonymousfeedbackform/5min breaks/5min "socialbreaks"whereIassignyoutotalktosomebody11.Howtomakeyouengagemoreactively?SQLworkedreallywell.Morerandomcallingfromclasslist?12.…

12

Review

• Aquicktourd'horizon throughthe5topicswediscussedinclass• Forfinalexam:everythingthatwascoveredinclass:- slides,homeworks,solutions,discussions,Piazza,Gradiance,Jupyther

13

1.SQL

14

SELECT SFROM R1,…,Rn

WHERE C1GROUP BY a1,…,ak

HAVING C2ORDER BY S2

Evaluation1. EvaluateFROM2. WHERE,applyconditionC13. GROUPBYtheattributesa1,…,ak4. ApplyconditionC2toeachgroup(mayhaveaggregates)5. ComputeaggregatesinSandreturntheresult6. SortrowsbyORDERBYclause

1234

5

C1: is any condition on the attributes in R1,…,Rn

C2: is any condition on aggregates and on attributes a1,…,ak

S: may contain attributes a1,…,ak and/or any aggregates but no other attributes

GeneralformofSQLQuery

6The logical order is useful for under-standing, but not always correct. The ANSI SQL standard does not require a specific processing order and leaves that to the implementation. Recall our intro example with SELECT DISTINCT and order by! Notice that that example can't be explained with the order shown here

15

From®Where® GroupBy® Select

SELECT product, sum(quantity) as TotalSalesFROM PurchaseWHERE price > 1GROUP BY product

Product TotalSalesBagel 40Banana 20

Product Price QuantityBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10

123

4

Select contains• grouped attributes • and aggregates

Purchase308

16

Let'sconfusethedatabaseengine

SELECT product, quantityFROM PurchaseGROUP BY product

Product QuantityBagel ?Banana ?

Product Price QuantityBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10

WhatquantityshouldtheDBreturnforBanana?

TheDBengineisconfused,thereisnosinglequantityforbanana(it'sanill-definedquery).Itshouldthusreturnanerror(onlySQLitemisbehavesandreturnssomething,butwhichmakesnosense).Pleasethinkthisthroughcarefully!

Purchase308

17

Don'tusenewAliasinHAVINGclause

Product Price QuantityBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10

SELECT product, sum(quantity) as SumQFROM PurchaseWHERE quantity > 15GROUP BY productHAVING SumQ > 35

What does this query return over the given database?

Product SumQBagel 40Banana 50

Error in SQL server! Reason: HAVING is evaluated before SELECT!(However, SQLite works: different implementation)

Source:http://stackoverflow.com/questions/2068682/why-cant-i-use-alias-in-a-count-column-and-reference-it-in-a-having-clause

308

18

Don'tusenewAliasinHAVINGclause

Product Price QuantityBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10

SELECT product, sum(quantity) as SumQFROM PurchaseWHERE quantity > 15GROUP BY productHAVING sum(quantity) > 35ORDER BY sumQ desc

What does this query return over the given database?

Product SumQBanana 50Bagel 40

308

Works! Notice that new sorting

19

etext eid fid ftextOne 1 1 UnThree 3 3 TroisFour 4 4 QuatreFive 5 5 CinqSix 6 6 Siz

Illustrationfid fText1 Un3 Trois4 Quatre5 Cinq6 Siz7 Sept8 Huit

EnglisheText eidOne 1Two 2Three 3Four 4Five 5Six 6

French

SELECT *FROM English, FrenchWHERE eid = fid

361

SELECT *FROM English JOIN FrenchON eid = fid

Sameas:

An"innerjoin":

"JOIN"sameas

"INNERJOIN"

20

etext eid fid ftextOne 1 1 UnTwo 2 NULL NULLThree 3 3 TroisFour 4 4 QuatreFive 5 5 CinqSix 6 6 SizNULL NULL 7 SeptNULL NULL 8 Huit

Illustrationfid fText1 Un3 Trois4 Quatre5 Cinq6 Siz7 Sept8 Huit

EnglisheText eidOne 1Two 2Three 3Four 4Five 5Six 6

French

SELECT *FROM English FULL JOIN FrenchON English.eid = French.fid

SQLitedoesnotsupport"FULLOUTERJOIN"sL (but"LEFTJOIN")

361

SELECT *FROM English JOIN FrenchON eid = fid

"FULLJOIN"sameas

"FULLOUTERJOIN"

21

2 7,81,3,4-6

Illustrationfid fText1 Un3 Trois4 Quatre5 Cinq6 Siz7 Sept8 Huit

EnglisheText eidOne 1Two 2Three 3Four 4Five 5Six 6

French 361

Source:Fig.7-2,Hofferetal.,ModernDatabaseManagement,10eded,2011.

= FULL (OUTER) JOIN

= (INNER) JOIN

22

EmptyGroupProblem

What’s wrong?

SELECT name, count(*)FROM Item, Purchase2WHERE name = iName

and month = 9GROUP BY name

Item(name,category)Purchase2(iName,store,month)

334

Compute,foreachproduct,thetotalnumberofsalesinSept(=month9)

23

SELECT name, count(store)FROM Item LEFT JOIN Purchase2 ON

name = iNameand month = 9

GROUP BY name

EmptyGroupProblem

Now we also get the products with 0 sales

Weneedtouseanattributefrom"Purchase2"togetthecorrect0count.Try"name"from"Item".

Item(name,category)Purchase2(iName,store,month)

Compute,foreachproduct,thetotalnumberofsalesinSept(=month9)

334

24

2.DBdesign

25

DatamodelingandDatabaseDesignProcess

DoctorPatient

name

zip name dno

patient_ofConceptualModel:("technologyindependent")describemaindataitems

LogicalModel("forrelationaldatabases"):Tables,ConstraintsFunctionalDependenciesNormalization:Eliminatesanomalies

Physicalstoragedetails

1.ERDiagram

2.RelationalDatabaseDesign

3.DatabaseImplementation

Result:PhysicalSchema

PhysicalModel

26

FromE/RDiagramstoRelationalSchema

• Keyconcept• Entitysetsbecomerelations,Relationshipscanbecomerelations(tablesinRDBMS)• Tablesareconnectedwithforeignkeyconstraints

• Adatabaseschema- Amapofthetablesandfields(attributes)inthedatabase- Thisiswhatisimplementedinthedatabasemanagementsystem- Partofthe“design”process

27

Example:translatethisERDv1intotables

Customer

First name

makes Order

Last name

City

State Zip

Price

Product name

Order DateOrder number

Product

contains

CustomerID

Quantity

Whatdowedo?

28

Example:translatethisERDv2intotables

Product

ProductID

ProductName

Price

Order

OrderNumber

OrderDate

Customer

CustomerID

FirstName

LastName

City

State

Zip

Quantity

ContainsMakes

Whatdowedo?

29

Example:OurOrderDatabaseschemaOriginal1:nrelationship

Originaln:nrelationship

• Order-Productisadecomposedmany-to-manyrelationship- Order-Producthasa1:n relationshipwithOrderandProduct- Nowanordercanhavemultipleproducts,andaproductcanbeassociated

withmultipleorders

Product

ProductID

ProductName

Price

Order-Product

OrderProductID

OrderNumber

ProductID

Quantity

Order

OrderNumber

OrderDate

CustomerID

Customer

CustomerID

FirstName

LastName

City

State

Zip

30

A)AssociativeEntityRelations(NoIdentifier)

31

A)AssociativeEntityRelations(NoIdentifier)

Defaultprimarykeyfortheassociationrelationiscomposedoftheprimarykeysofthetwoentities(asinM:Nrelationship)

32

B)AssociativeEntityRelations(WithIdentifier)

33

B)AssociativeEntityRelations(WithIdentifier)

• Identifierattributebecomesnewprimarykeyinrelation

• Foreignkeysreferenceallrelatedentities

Doweneedthekey?

34

RelationalSchemaDesign

Doyouseeanyanomalies?

Recallsetattributes(personswithseveralphones):

• Onepersonmayhavemultiplephones,butlivesinonlyonecity• Primarykeyisthus(SSN,PhoneNumber)

Name SSN PhoneNumber City

Fred 123-45-6789 412-555-1234 Boston

Fred 123-45-6789 412-555-6543 Boston

Joe 987-65-4321 908-555-2121 Westfield

Employee

35

RelationalSchemaDesign

Doyouseeanyanomalies?

Recallsetattributes(personswithseveralphones):

Whatdowedo????

• Onepersonmayhavemultiplephones,butlivesinonlyonecity• Primarykeyisthus(SSN,PhoneNumber)

Name SSN PhoneNumber City

Fred 123-45-6789 412-555-1234 Boston

Fred 123-45-6789 412-555-6543 Boston

Joe 987-65-4321 908-555-2121 Westfield

Employee

• Deletionanomalies:whatifJoedeleteshisphonenumber?(whatifJoehadnophone#)

• Insertanomalies:whatifJoegetsasecondphonenumber• Updateanomalies:whatifFredmovesto"NewYork"?

36

RelationDecompositionBreaktherelationintotwo:

Name SSN City

Fred 123-45-6789 Boston

Joe 987-65-4321 Westfield

SSN PhoneNumber

123-45-6789 412-555-1234

123-45-6789 412-555-6543

987-65-4321 908-555-2121Anomalieshavegone:• Nomorerepeateddata• EasytomoveFredto"NewYork"(how?)• EasytodeleteallJoe'sphonenumbers(how?)

Name SSN PhoneNumber City

Fred 123-45-6789 412-555-1234 Boston

Fred 123-45-6789 412-555-6543 Boston

Joe 987-65-4321 908-555-2121 Westfield

Employee

Employee Phone

37

KeysandSuperkeys

Asuperkey isasetofattributesA1,…,An s.t.foranyother attributeB inR,wehave {A1,…,An}à B

Akey isaminimal superkey(alsocalled"candidatekey")

I.e.allattributesarefunctionallydeterminedbyasuperkey

Thismeansthatnosubsetofakeyisalsoasuperkey (i.e.,droppinganyattributefromthekeymakesitnolongerasuperkey)

38

QuickrecapFDs• FunctionalDependency(FD):Thevalueofonesetofattributes(thedeterminant)uniquelydeterminesthe

valueofanothersetofattributes(thedependents)• Asuperkey (SK) isasasetofattributesofarelationschemauponwhichallattributesoftheschemaare

functionallydependent.• Acandidatekey(CK) isanon-redundant(minimal)SK• Primeattribute:belongingtosomecandidatekey• PartialFD:FDinwhichmorenon-primeattributesarefunctionallydependentonpart(butnotall)ofanyCK• TransitiveFD:AnFDbetweentwo(ormore)nonkey attributes• 3NF:nopartialnortransitiveFD

39

Boyce-CoddNormalForm(BCNF)

• Boyce-Codd normalform(BCNF)- ArelationisinBCNF,ifandonlyif,every(non-trival)determinantisasuperkey.

• Thedifferencebetween3NF andBCNF isthatforaFDAàB,- 3NF allowsthisdependencyinarelationifBisaprimary-keyattribute andAisnotacandidatekey,

- whereasBCNF insiststhatforthisdependencytoremaininarelation,AmustbeaSK(containaCK).

40

3NFtoBCNF

Source:Hoffer,Ramesh,Topi,Moderndatabasemanagement,10th ed,AppendixB,2010.

41

3NFtoBCNF

Source:Hoffer,Ramesh,Topi,Moderndatabasemanagement,10th ed,AppendixB,2010.

42

3NFtoBCNF

Source:Hoffer,Ramesh,Topi,Moderndatabasemanagement,10th ed,AppendixB,2010.

43

BCNFvs3NF• BCNF:ForeveryfunctionaldependencyX->YinasetF offunctionaldependenciesoverrelationR,either:- Xisasuperkey ofR- (orYisasubsetofX,thustheFDistrivial)

• 3NF:ForeveryfunctionaldependencyX->YinasetFoffunctionaldependenciesoverrelationR,either:- Xisasuperkey ofR- orYisasubsetofKforsomeCK (Yisprime)

• N.b., nosubsetofakeyisakey- (orYisasubsetofX,thustheFDistrivial)

44

AproblemwithBCNF{Unit} à {Company}{Company,Product} à {Unit}

WedoaBCNFdecompositionona“bad”FD:{Unit}+ = {Unit, Company}

WelosetheFD{Company,Product} à {Unit}!!

Unit Company Product… … …

Unit Company… …

Unit Product… …

{Unit} à {Company}

45

SoWhyisthataProblem?

Noproblemsofar.Alllocal FD’saresatisfied.

Unit CompanyGalaga99 NEUBingo NEU

Unit ProductGalaga99 DatabasesBingo Databases

Unit Company ProductGalaga99 NEU DatabasesBingo NEU Databases

Let’sputallthedatabackintoasingletableagain:

{Unit} à {Company}

ViolatestheFD{Company,Product} à {Unit}!!

{Unit} à {Company}{Company,Product} à {Unit}

46

TheProblem

• WestartedwithatableRandFDsF

• WedecomposedRintoBCNFtablesR1,R2,…withtheirownFDsF1,F2,…

• Weinsertsometuplesintoeachoftherelations—whichsatisfytheirlocalFDsbutwhenreconstructitviolatessomeFDacrosstables!

PracticalProblem:ToenforceFD,mustreconstructR—oneachinsert!

47

3.Transactions

48

ACID

• Atomicity- Eitheralloperationsappliedornoneare(hence,weneednotworryabouttheeffectof

incomplete/failedtransactions)• Consistency- Eachtransactioncanstartwithaconsistentdatabaseandisrequiredtoleavethe

databaseconsistent(bringtheDBfromonetoanotherconsistentstate)• Isolation- Theeffectofatransactionshouldbeasifitistheonlytransactioninexecution(in

particular,changesmadebyothertransactionsarenotvisibleuntilcommitted)• Durability- Oncethesysteminformsatransactionsuccess,theeffectshouldholdwithoutregret,

evenifthedatabasecrashes(beforemakingallchangestodisk)

49

SchedulingExample1

Begin

Read(A,x)

x = x-100

Write(A,x)

Read(C,y)

y=y+100

Write(C,y)

Commit

txn1 txn2

Begin

Read(A,v)

v = v-100

Write(A,v)

Read(B,w)

w=w+100

Write(B,w)

Commit

Read(A,v)

v = v-100

Write(A,v)

Read(B,w)

w=w+100

Write(B,w)

Read(A,x)

x = x-100

Write(A,x)

Read(C,y)

y=y+100

Write(C,y)

50

SchedulingExample2

Begin

Read(A,x)

x = x-100

Write(A,x)

Read(C,y)

y=y+100

Write(C,y)

Commit

txn1 txn2

Begin

Read(A,v)

v = v-100

Write(A,v)

Read(B,w)

w=w+100

Write(B,w)

Commit

Read(A,v)

v = v-100

Write(A,v)

Read(B,w)

w=w+100

Write(B,w)

Read(A,x)

x = x-100

Write(A,x)

Read(C,y)

y=y+100

Write(C,y)

51

Recall:ConcurrencyasInterleavingTXNs

Wecalltheparticularorderofinterleavingaschedule

T1T2

R(A) R(B)W(A) W(B)

SerialSchedule:

R(A) R(B)W(A) W(B)

T1T2

R(A) R(B)W(A) W(B)

InterleavedSchedule:

R(A) R(B)W(A) W(B)

• Forourpurposes,havingTXNs occurconcurrentlymeansinterleavingtheircomponentactions(R/W)

52

Recall:“Good”vs.“bad”schedules

Wewanttodevelopwaysofdiscerning“good”vs.“bad”schedules

SerialSchedule:

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

X

InterleavedSchedules:

Why?

53

WaysofDefining“Good”vs.“Bad”Schedules

• Recall:wecallascheduleserializableifitisequivalenttosomeserialschedule- Weusedthisasanotionofa“good”interleavedschedule,sinceaserializableschedulewillmaintainisolation&consistency

• Now,we’lldefineastricter,butveryusefulvariant:- Conflictserializability

We’llneedtodefineconflicts first..

54

Conflicts

Twoactionsconflict iftheyarepartofdifferentTXNs,involvethesamevariable,andatleastoneofthemisawrite

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)W-RConflict

W-WConflict

55

Conflicts

Twoactionsconflict iftheyarepartofdifferentTXNs,involvethesamevariable,andatleastoneofthemisawrite

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

All“conflicts”!

56

ConflictSerializability

• Twoschedulesareconflictequivalent if:

- TheyinvolvethesameactionsofthesameTXNs

- EverypairofconflictingactionsoftwoTXNs areorderedinthesameway

• ScheduleSisconflictserializable ifSisconflictequivalenttosomeserialschedule

Conflictserializable⇒ serializableSoifwehaveconflictserializable,wehaveconsistency&isolation!

57

Recall:“Good”vs.“bad”schedules

57

Conflictserializability alsoprovidesuswithanoperativenotionof“good”vs.“bad”schedules!

SerialSchedule:

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

X

InterleavedSchedules:

Notethatinthe“bad”schedule,theorderofconflictingactionsisdifferentthantheabove(orany)serialschedule!

58

Note:Conflictsvs.Anomalies

• Conflicts arethingswetalkabouttohelpuscharacterizedifferentschedules- Presentinboth“good”and“bad”schedules

• Anomalies areinstanceswhereisolationand/orconsistencyisbrokenbecauseofa“bad”schedule- Weoftencharacterizedifferentanomalytypesbywhattypesofconflictspredicatedthem

59

TheConflictGraph

• Let’snowconsiderlookingatconflictsattheTXN level

• ConsideragraphwherethenodesareTXNs,andthereisanedgefromTi àTj ifanyactionsinTi precedeandconflictwith anyactionsinTj

T1 T2

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

60

Whatcanwesayabout“good”vs.“bad”conflictgraphs?

SerialSchedule:

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

T1

T2

R(A) R(B)W(A) W(B)

R(A) R(B)W(A) W(B)

X

InterleavedSchedules:

Abitcomplicated…

61

Whatcanwesayabout“good”vs.“bad”conflictgraphs?

SerialSchedule:

X

InterleavedSchedules:

T1 T2 T1 T2

T1 T2

Theorem:Scheduleisconflictserializable ifandonlyifitsconflictgraphisacyclic

Simple!

62

4.Internals

63

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

18,43 24,2745,38

Assumewestillonlyhave3 bufferpages(Buffernotpictured);M=3

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

64

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

18,43 24,2745,38

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

1. Split into files small enough to sort in buffer…

Assumewestillonlyhave3 bufferpages(Buffernotpictured);M=3

65

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

1. Split into files small enough to sort in buffer… and sort

Assumewestillonlyhave3 bufferpages(Buffernotpictured);M=3

Calleachofthesesortedfilesarun

66

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

2. Now merge pairs of (sorted) files… the resulting files will be sorted!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Assumewestillonlyhave3 bufferpages(Buffernotpictured);M=3

67

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

3. And repeat…

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Assumewestillonlyhave3 bufferpages(Buffernotpictured);M=3

Calleachofthesestepsapass

68

RunningExternalMergeSortonLargerFiles

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

4. And repeat!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Disk

3,10 10,101,3

12,16 18,1812,12

20,22 22,2318,18

24,24 27,2723,24

31,31 31,3327,31

33,38 39,4033,33

43,44 44,4541,42

48,55 55,5546,47

69

Simplified3-pageBufferVersion

AssumeforsimplicitythatwesplitanN-pagefile intoNsingle-pageruns andsortthese;then:

• Firstpass:MergeN/2pairsofrunseachoflength1page

• Secondpass:MergeN/4pairsofrunseachoflength2pages

• Ingeneral,forNpages,wedo 𝒍𝒐𝒈𝟐 𝑵 passes- +1fortheinitialsplit&sort

• Eachpassinvolvesreadingin&writingoutallthepages=2NIO

Unsortedinputfile

Split&sort

Merge

Merge

Sorted!

à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!

70

Recap:High-leveloverview:indexes

id age salary other

006 19 50k ...

005 20 55k ...

004 25 50k ...

007 30 80k ...

002 35 75k ...

003 35 70k ...

001 40 65k ...

id age salary other

006 19 50k ...

004 25 50k ...

005 20 55k ...

001 40 65k ...

003 35 70k ...

002 35 75k ...

007 30 80k ...

datafile=indexfileclustered(primary)index

indexfileunclustered (secondary)index

71

NLJ:Orderoftablesmatters

M=102pages

100

1R

S

B(S)=1000B(R)=500

1´5´100

B(R)+B(R)/(M-2)´B(S)500+(500/100)´1,000=5,500

CostR:500CostS:5,000=5´1,000SUM:5,500

5´1000´1

100

1

S

R

B(R)=500B(S)=1000

1´10 100

B(S)+B(R)/(M-2)´B(S)1000+(1,000/100)´500=6,000

CostS:1,000CostR:5,000=10´500SUM:6,000

10´50´1

Ignoringoutputcost

VariantofExample15.4from"Cowbook"(Ramakrishan,Gehrke,Databasemanagementsystems,2003)

1

1

72

RAExpressionsCanGetComplex!

sname=fred sname=gizmo

P pidP ssn

seller-ssn=ssn

pid=pid

buyer-ssn=ssn

P name

PersonPurchasePersonProduct

73

5.NoSQL

74

SQLMeansMorethanSQL

• SQLstandsforthequerylanguage• ButcommonlyreferstothetraditionalRDBMS:- Relationalstorage ofdata

• Eachtupleisstoredconsecutively- Joins asfirst-classcitizens

• Infact,normalformspreferjoinstomaintenance- Strongguarantees ontransactionmanagement

• Noconsistencyworrieswhenmanytransactionsoperatesimultaneouslyoncommondata

• Focusonscalingup- Thatis,makeasinglemachinedomore,faster

75

Verticalvs.HorizontalScaling

"scalingup"

• Verticalscaling("scaleup"):youscalebyaddingmorepower(CPU,RAM)

• Horizontalscaling("scaleout"): youscalebyaddingmoremachines

"scalingout"

76

Verticalvs.Horizontalpartitioning

Source:http://www.piyushgupta.co.uk/2016/04/database-scaling-jargons.html,http://slideplayer.com/slide/12131436/70/images/17/SQL+Azure+Azure+Custom+Sharding.jpg

77

WeWillLookat4DataModels

Column-Family Store

Key/Value Store

Document Store Graph Databases

Source:BennyKimelfeld

78

ACIDMayBeOverlyExpensive

• Inquiteafewmodernapplications:- ACIDcontrastswithkeydesiderata:highvolume,highavailability-Wecanlivewithsomeerrors,tosomeextent- Ormoreaccurately,weprefertosuffererrorsthantobesignificantlylessfunctional

• Canthispointbemademore“formal”?

79

CAPServiceProperties

• Consistency:- everyread(toanynode)getsaresponsethatreflectsthemostrecentversionofthedata• Moreaccurately,atransactionshouldbehaveasifitchangestheentirestatecorrectlyinaninstant,Ideasimilartoserializability

• Availability:- everyrequest(toalivingnode)getsananswer:setsucceeds,getretunesavalue(ifyoucantalktoanodeinthecluster,itcanreadandwritedata)

• Partitiontolerance:- servicecontinuestofunctiononnetworkfailures(clustercansurvive

• Aslongasclientscanreachservers

80

TheCAPTheorem

EricBrewer’sCAPTheorem:

Adistributedservicecansupportatmosttwo

outofC,A andP

81

SimpleIllustration

set(x,1)

set(x,1)

ok

ok

get(x)

1

CAConsistency,Availability

set(x,2)

set(x,2)

wait...

get(x) CPConsistency,Partitiontolerance

set(x,2)

set(x,2)

ok

get(x) APAvailability,Partitiontolerance

1

1

Availability

Consistency

OurRelationalDatabaseworldsofar…

Inasystemthatmaysufferpartitions,youhavetotradeoffconsistencyvs.availability

82Source:http://blog.nahurst.com/visual-guide-to-nosql-systems ,2010

83

TheBASEModel

• AppliestodistributedsystemsoftypeAP• BasicAvailability- Providehighavailabilitythroughdistribution:Therewillbearesponsetoanyrequest.

Responsecouldbea‘failure’toobtaintherequesteddata,orthedatamaybeinaninconsistentorchangingstate.

• Softstate- Inconsistency(staleanswers)allowed:Stateofthesystemcanchangeovertime,so

evenduringtimeswithoutinput,changescanhappendueto‘eventualconsistency’

• Eventualconsistency- Ifupdatesstop,thenaftersometimeconsistencywillbeachieved

• Achievedbyprotocolstopropagateupdatesandverifycorrectnessofpropagation(gossipprotocols)

• Philosophy:besteffort,optimistic,stalenessandapproximationallowed

84

4maindatamodels

1. Key-valuestores(e.g.,Redis)2. Column-familystores(e.g.,Cassandra)3. Documentstores(e.g.,MongoDB)4. Graphdatabases(e.g.,Neo4j)

85

Key-ValueStores

• Essentially,bigdistributedhashmaps• OriginattributedtoDynamo– Amazon’sDBforworld-scalecatalog/cartcollections- ButBerkeleyDBhasbeenherefor>20years

• Storepairs⟨key,opaque-value⟩- OpaquemeansthatDBdoesnotassociateanystructure/semanticswiththevalue;oblivioustovalues

- Thismaymeanmoreworkfortheuser:retrievingalargevalueandparsingtoextractanitemofinterest

• Sharding viapartitioningofthekeyspace- Hashing,gossipandremappingprotocolsforloadbalancingandfaulttolerance

86

key value

set x 10 x 10

hset h y 5 h yà5

hset h1 name twohset h1 value 2_ h1

nameàtwovalueà2

hmset p:22 name Alice age 25 p:22 nameàAliceageà25

sadd s 20___sadd s Alicesadd s Alice s {20,Alice}

rpush l arpush l blpush l c l (c,a,b)

(simple value)

(hash table)

(set)

(list)

key maps to:

ExampleofRedis Commands

get x>> 10

hget h y>> 5

hkeys p:22>> name , age

smembers s>> 20 , Alice

scard s>> 2

llen l>> 3

lrange l 1 2 >> a , b

lindex l 2>> b

lpop l >> c

rpop l >> b

87

key value

set x 10 x 10

hset h y 5 h yà5

hset h1 name twohset h1 value 2_ h1

nameàtwovalueà2

hmset p:22 name Alice age 25 p:22 nameàAliceageà25

sadd s 20___sadd s Alicesadd s Alice s {20,Alice}

rpush l arpush l blpush l c l (c,a,b)

(simple value)

(set)

(list)

key maps to:

ExampleofRedis Commands

get x>> 10

hget h y>> 5

hkeys p:22>> name , age

smembers s>> 20 , Alice

scard s>> 2

llen l>> 3

lrange l 1 2 >> a , b

lindex l 2>> b

lpop l >> c

rpop l >> b

(hash table)

88

Whentouseit

• Useit:- Allaccesstothedatabasesisviaprimarykey- Storingsessioninformation(websession)- userorproductprofiles(singleGEToperation)- shoppingcardinformation(basedonuserid)

• Don'tuseit:- relationshipsbetweendifferentsetsofdata- querybydata(basedonvalues)- operationsonmultiplekeysatatime

89

4maindatamodels

1. Key-valuestores(e.g.,Redis)2. Column-familystores(e.g.,Cassandra)3. Documentstores(e.g.,MongoDB)4. Graphdatabases(e.g.,Neo4j)

90

keyspace

2TypesofColumnStores

sid name address year faculty861 Alice Haifa 2 NULL753 Amir London NULL CS955 Ahuva NULL 2 IE

StandardRDB

id sid1 8612 7533 955

id name1 Alice2 Amir3 Ahuva

id address1 Haifa2 London

id year1 23 2

id faculty2 CS3 IE

Eachcolumnstoredseparately.Why?Efficiency (fetchonlyrequiredcolumns),

compression,sparse dataforfree

1 sid:861 name:Aliceaddress:Haifa ts:20

2 sid:753 name:Amiraddress:London ts:22

3 sid:955name:Ahuva ts:32

1 year:2 ts:26

2 faculty:CS ts:25email:{prime:c@d ext:c@e}

3 year:2 faculty:IE ts:32email:{prime:a@b ext:a@c}

columnfamily

columnfamily

“column”

“supercolumn”

Column-FamilyStore (NoSQL)

timestampforconflicts

Columnstore (stillSQL)

Cassandradatamodel

91

Whentouseit(e.g.Cassandra)

• Useit:- Eventlogging(multipleapplicationscanwriteindifferentcolumnsandrow-key:appname:timestamp)

- CMS:Storeblogentrieswithtags,categories,linksindifferentcolumns- Counters:e.g.visitorsofapage

• Don'tuseit:- ifyourequireACID,consistency- ifyouchangequerypatternsoften(inRDMS schemachangesarecostly,inCassandraquerychangesare:requirechangingthecolumnfamilydesign)

92

4maindatamodels

1. Key-valuestores(e.g.,Redis)2. Column-familystores(e.g.,Cassandra)3. Documentstores(e.g.,MongoDB)4. Graphdatabases(e.g.,Neo4j)

93

DocumentStores

• Similarinnaturetokey-valuestore,butvalueistreestructured asadocument

• Motivation:avoidjoins;ideally,allrelevantjoinsalreadyencapsulatedinthedocumentstructure

• Adocumentisanatomicobjectthatcannotbesplitacrossservers- Butadocumentcollectionwillbesplit

• Moreover,transactionatomicity istypicallyguaranteedwithinasingledocument

• Modelgeneralizescolumn-familyandkey-valuestores

94

DataExample:High-level

{name:"Alice",age:21,status:"A",groups:["algorithms","theory"]

}

Document

Source:Modifiedfromhttps://docs.mongodb.com/v3.0/core/crud-introduction/

Collection{

name:"Alice",age:21,status:"A",groups:["algorithms","theory"]

}

{name:"Bob",age:18,status:"B",groups:["database","cooking"]

}

{name:"Charly",age:22,status:"A",groups:["database","cars"]

}

{name:"Dorothee",age:16,status:"A",groups:["cars","sports"]

}

~record/row/tuple ~table

95

DataExample

{item:"ABC2",details:{model:"14Q3",manufacturer:"M1Corporation"},stock:[{size:"M",qty:50}],category:"clothing”

}

{item:"MNO2",details:{model:"14Q3",manufacturer:"ABCCompany"},stock:[{size:"S",qty:5},{size:"M",qty:5},{size:"L",qty:1}],category:"clothing”

}

Collectioninventory

db.inventory.insert({item:"ABC1",details:{model:"14Q3",manufacturer:"XYZCompany"},stock:[{size:"S",qty:25},{size:"M",qty:50}],category:"clothing"}

) Document insertionSource:Modifiedfromhttps://docs.mongodb.com/v3.0/core/crud-introduction/

96

ExampleofaSimpleQuery

{_id:"a",cust_id:"abc123",status:"A",price:25,items:[{sku:"mmm",qty:5,price:3},

{sku:"nnn",qty:5,price:2}]}{

_id:"b",cust_id:"abc124",status:"B",price:12,items:[{sku:"nnn",qty:2,price:2},

{sku:"ppp",qty:2,price:4}]}

Collectionordersdb.orders.find({status:"A"},{cust_id:1,price:1,_id:0}

)

InSQLitwouldlooklikethis:SELECTcust_id,priceFROMordersWHEREstatus="A"

{cust_id:"abc123",price:25

}

selection

projection

Findallordersandpricewithwithstatus"A"

97

Whentouseit

• Useit:- Eventlogging:differenttypesofeventsacrossanenterprise- CMS:usercomments,registration,profiles,web-facingdocuments- E-commerce:flexibleschemaforproducts,evolvedatamodels

• Don'tuseit:- ifyourequireatomiccross-documentoperations- queriesagainstvaryingaggregatestructures

98

4maindatamodels

1. Key-valuestores(e.g.,Redis)2. Column-familystores(e.g.,Cassandra)3. Documentstores(e.g.,MongoDB)4. Graphdatabases(e.g.,Neo4j)

99

InterestedinResearch?

100

Questionsonfinalexam&grading

… oranythingelse?