Introduc/on to Data Management Lecture #17 SQL NoSQL J · 6/2/17 5 As the Relaonal Era Unfolded •...

Preview:

Citation preview

6/2/17

1

Introduc/ontoDataManagement

Lecture#17SQLNoSQL(J)

Instructor:MikeCareymjcarey@ics.uci.edu

Announcements

•  Homeworkinfo:– HW#7:Duetomorrow(6PM).– HW#8istheend(“NoSQL”)!

•  DueaweekfromFriday(6PM).•  Latepenalty:10pts/day(BUTJUSTONEDAY).

•  NoSQLlectureplans:–  Today/Tuesday:NoSQL&BigData(alaAsterixDB)

•  Notinbook:Seepaperlinkedtowikisyllabus!•  AlsoseedocsontheApacheAsterixDBsite.

– WatchtheStanfordonlinelecturematerial!•  WatchbothoftheJSONvideolectures.•  Besuretotakethequizattheend!

6/2/17

2

OurPlanforNoSQL+AsterixDB•  Thepre-rela/onalera•  Therela/onalDBera•  Beyondrowsandcolumns?

1.  Theobject-orientedDBera2.  Theobject-rela/onalDBera3.  TheXMLDBera4.  TheNoSQLDBera*(*watchStanfordmaterialtoo...!)

•  Reflec/ons,andthen...AsterixDB!

TheBirthofToday’sDBMSField

•  InthebeginningwastheWord,andtheWordwaswithCodd,andtheWordwasCodd...– 1970CACMpaper:“Arela/onalmodelofdataforlargeshareddatabanks”

•  Manyrefertothisasthefirstgenera/onof(real?)databasemanagementsystems

6/2/17

3

ThisisaSQL/NoSQLHistoryTalk•  Thepre-rela/onalera•  Therela/onalDBera•  Beyondrowsandcolumns?

1.  Theobject-orientedDBera2.  Theobject-rela/onalDBera3.  TheXMLDBera4.  TheNoSQLDBera

•  Reflec/ons&challenges

TheFirstDecadeB.C.•  Theneedforadatamanagementlibrary,oradatabasemanagementsystem,hadactuallybeenwellrecognized– HierarchicalDBsystems(e.g.,IMSfromIBM)– NetworkDBsystems(mostnotablyCODASYL)

•  Thesesystemsprovidednaviga9onalAPIs– Systemsprovidedfiles,records,pointers,indexes– Programmershadto(carefully!)scanorsearchforrecords,followparent/childstructuresorpointers,andmaintaincodewhenanythingphysicalchanged

6/2/17

4

TheFirstDecadeB.C.(cont.)

Order(id,custName,custCity,total)

Item(ino,qty,price)

Product(sku,name,listPrice,size,power)

Item-ProductItem-Order

Item-Order

123FredLA25.97401GarfieldT-Shirt9.99XL-

544USBCharger5.99-115V129.99 213.99

Order

Item Item

Product

Product

Item-Product

Item-Product

EntertheRela/onalDBEra

•  Besuretono/cethat–  Everything’snow(logical)rowsandcolumns–  Theworldisflat;columnsareatomic(1NF)– Dataisnowconnectedviakeys(foreign/primary)

Order(id,custName,custCity,total)

Item(order-id,ino,product-sku,qty,price)

Product(sku,name,listPrice,size,power)123FredLA25.97401GarfieldT-Shirt9.99XLnull

544USBCharger5.99null115V

123140129.99

123254413.99

6/2/17

5

AstheRela/onalEraUnfolded•  TheSpartansimplicityoftherela/onaldatamodelmadeitpossibletostarttacklingtheopportuni/esandchallengesofalogicaldatamodel– Declara/vequeries(RelAlg/Calc,Quel,QBE,SQL,...)–  Transparentindexing(physicaldataindependence)– Queryop/miza/onandexecu/on–  Views,constraints,referen/alintegrity,security,...–  Scalable(shared-nothing)parallelprocessing

•  Today’smul/-$Bindustrywasslowlyborn–  Commercialadop/ontook~10-15years–  ParallelDBsystemstook~5moreyears

EntertheObject-OrientedDBEra

•  No/cethat:– Datamodelcontainsobjectsandpointers(OIDs)–  Theworldisnolongerflat–theOrderandProductschemasnowhaveset(Item)andProductinthem,respec/vely

123FredLA25.97{��}401GarfieldT-Shirt9.99XL-

544USBCharger5.99-115V129.99�213.99�

Order

ItemItem

Product

Product

6/2/17

6

WhatOODBsSoughttoOffer•  Mo/vatedlargelybylate1980’sCAxapplica/ons(e.g.,mechanicalCAD,VLSICAD,soywareCAD,...)–  Richschemaswithinheritance,complexobjects,objectiden/ty,references,...

– Methods(“behavior”)aswellasdataintheDBMS–  Tightbindingswith(OO)programminglanguages–  Fastnaviga/on,somedeclara/vequerying

•  Ex:Gemstone,Ontos,Objec/vity,Versant,ObjectDesign,O2,alsoDASDBS(sortof)

WhyOODBs“FellFlat”•  Toosoonforanother(radical)DBtechnology– Alsotechnicallyimmaturerela/vetoRDBMSs

•  TightPLbindingswereadouble-edgedsword– Datashared,outlivesprogramminglanguages– Bindingsledtosignificantsystemheterogeneity– Alsomadeschemaevolu/onamajorchallenge

•  Systems“overfized”insomedimensions–  Inheritance,versionmanagement,...– Focusedonthickclients(e.g.,CADworksta/ons)

6/2/17

7

Product(sku,name,listPrice)ClothingProduct(size)underProductElectricProduct(power)underProduct

EntertheObject-Rela/onalDBEra

•  Besuretono/ce:– “Onesizefitsall!”(J)– UDTs/UDFs,tablehierarchies,references,...– Buttheworldgotflazeragain...(TiminglaggedOODBsbyjustafewyears)

Order(id,customer,total)

Item(order-id,ino,product-sku,qty,price)

401GarfieldT-Shirt9.99XL

544USBCharger5.99115V(123)1(401)29.99

(123)2(544)13.99

123FredLA25.97

WhatO-RDBsSoughttoOffer•  Mo/vatedbynewlyemergingapplica/onopportuni/es(mul/media,spa/al,text,...)– User-definedfunc9ons(UDTs/UDFs)&aggregates– Datablades(UDTs/UDFs+indexingsupport)– OOgoodiesfortables:rowtypes,references,...– Nestedtables(well,atleastOracleaddedthese)

•  Backtoamodelwhereapplica/onswerelooselyboundtotheDBMS(e.g.,ODBC/JDBC)

•  Ex:ADT-Ingres,Postgres,Starburst,UniSQL,Illustra,DB2,Oracle

6/2/17

8

WhyO-RDBs“FellFlat”•  SignificantdifferencesacrossDBvendors– SQLstandardiza/onlaggedsomewhat– Didn’tincludedetailsofUDT/UDFextensions– Toughtoextendtheinnards(forindexing)

•  Applica/onissues(andmul/plepla{orms)– Leastcommondenominatorvs.coolestfeatures– Tools(e.g.,DBdesigntools,ORMlayers,...)

•  Alsos/llprobablyabittoomuchtoosoon–  ITdepartmentss/llrollinginRDBMSsandcrea/ngrela/onaldatawarehouses

ThenCametheXMLDBEra<Orderid=”123”><Customer><custName>Fred</custName><custCity>LA</custCity></Customer><total>25.97</total><Items><Itemino=”1”><product-sku>401</product-sku><qty>2</qty><price>9.99</price></Item><Itemino=“2”><product-sku>544</product-sku><qty>1</qty><price>3.99</price></Itemino=”2”></Items></Order>

<Productsku=”401”><name>GarfieldT-Shirt</name><listPrice>9.99</listPrice><size>XL</size></Product><Productsku=”544”><name>USBCharger</name><listPrice>5.99</listPrice><power>115V</power></Product>

Notethat-Theworld’slessflatagain-We’renowinthe2000’s

6/2/17

9

WhatXMLDBsSoughttoOffer•  One<flexible/>datamodelfitsall(XML)– Originsindocumentmarkup(SGML)– Nesteddata–  Schemavariety/op/onality

•  Newdeclara/vequerylanguage(XQuery)– Designedbothforqueryingandtransforma/on–  Earlystandardiza/oneffort(W3C)

•  TwodifferentDB-relatedusecases,inreality– Datastorage:Lore(pre-XML),Na/x,Timber,Ipedo,MarkLogic,BaseX;alsoDB2,Oracle,SQLServer

– Dataintegra9on:NimbleTechnology,BEALiquidData(fromEnosys),BEAAquaLogicDataServicesPla{orm

WhyXMLDBs“FellFlat”Too•  Document-centricorigins(vs.datausecases)ofXMLSchemaandXQuerymadeamessofthings– W3CXPATHlegacy(K)– Documentiden/ty,documentorder,...– Azributesvs.elements,nulls,...– Mixedcontent(overkillfornon-documentdata)

•  Twootherexternaltrendsalsoplayedarole–  SOAandWebservicescamebutthenwent–  JSON(andRESTfulservices)appearedonthescene

•  Note:Likelys/llanimportantnichemarket...

6/2/17

10

NowtheNoSQLDBEra?•  NotfromtheDBworld– Distributedsystemsfolks– Alsovariousstartups

•  FromcachesàK/Vusecases– Neededmassivescale-out– OLTP(vs.parallelDB)apps–  Simple,low-latencyAPI– NeedakeyK,butwantnoschemaforV–  Record-levelatomicity,replicaconsistencyvaries

•  Inthecontextofthistalk,NoSQLdoesnotmean– Hadoop(orSQLonHadoop)– Graphdatabasesorgraphanaly/cspla{orms

NoSQLData(JSON-based)

{“id”:“123”,“Customer”:{“custName”:“Fred”,“custCity”:“LA”}“total”:25.97,“Items”:[{“product-sku”:401,“qty”:2,“price”:9.99},{“product-sku”:544,“qty”:1,“price”:3.99}]}

{“sku”:401,“name”:“GarfieldT-Shirt”,“listPrice”:9.99,“size”:“XL”}{“sku”:544,“name”:“USBCharger”,“listPrice”:5.99,“power”:“115V”}

Notethat-Theworld’snotflat,butit’sless<messy/>-We’renowinthe2010’s,/ming-wise

Collec/on(Order) Collec/on(Product)

6/2/17

11

•  Popularexamples:MongoDB,Couchbase•  Cove/ngthebenefitsofmanyDBgoodies– Secondaryindexingandnon-keyaccess– Declara/vequeries– Aggregatesandnow(ini/allysmall)joins

•  Seemtobeheadingtowards...– BDMS(thinkscalable,OLTP-aimed,parallelDBMS)– Declara/vequeriesandqueryop/miza/on,butappliedtoschema-lessdata

– Returnof(some,op/onal!)schemainforma/on

CurrentNoSQLTrends

OurExample:ApacheAsterixDB

hzp://asterixdb.apache.org/

6/2/17

12

BigData/WebWarehousing

23

Sowhatwenton–andwhy?

What’sgoingonrightnow?

What’sgoingon…?

24

Also:Today’sBigDataTangle

(Pig)

SQL

6/2/17

13

AsterixDB:“OneSizeFitsaBunch”

25

SemistructuredDataManagement

ParallelDatabaseSystems

Data-IntensiveComputing

BDMSDesiderata:•  Flexibledatamodel•  Efficientrun/me•  Fullquerycapability•  Costpropor/onalto

taskathand(!)•  Designedfor

con/nuousdatainges/on

•  Supporttoday’s“BigDatadatatypes”

•  •  • 

•  BuildanewBigDataManagementSystem(BDMS)–  Runonlargecommodityclusters–  Handlemassquan//esofsemistructureddata–  Openlylayered,forselec/vereusebyothers–  Sharewiththecommunityviaopensource(ASF)

•  Conductscalableinforma/onsystemsresearch,e.g.,–  Large-scalequeryprocessingandworkloadmanagement–  Highlyscalablestorageandindexmanagement–  Fuzzymatching,spa/aldata,date//medata(allinparallel)–  Novelsupportfor“fastdata”(bothinandout)

•  Trainnextgenera/onof“BigData”graduates26

ProjectGoals

6/2/17

14

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,userSince:date/me,address:{street:string,city:string,state:string,zip:string,country:string},friendIds:{{int32}},employment:[EmploymentType]};

ASTERIXDataModel(ADM)

27

createdatasetMugshotUsers(MugshotUserType)primarykeyid;

Highlightsinclude:•  JSON++baseddatamodel•  Richtypesupport(spa/al,temporal,…)•  Records,lists,bags•  Openvs.closedtypes

createtypeEmploymentTypeasopen{organiza/onName:string,startDate:date,endDate:date?};

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date/me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};

ASTERIXDataModel(ADM)

28

createdatasetMugshotUsers(MugshotUserType)primarykeyid;

Highlightsinclude:•  JSON++baseddatamodel•  Richtypesupport(spa/al,temporal,…)•  Records,lists,bags•  Openvs.closedtypes

createtypeEmploymentTypeasopen{organiza/onName:string,startDate:date,endDate:date?};

6/2/17

15

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32,alias:string,name:string,user-since:date/me,address:{street:string,city:string,state:string,zip:string,country:string},friend-ids:{{int32}},employment:[EmploymentType]}

createdataverseTinySocial;usedataverseTinySocial;createtypeMugshotUserTypeas{id:int32};createtypeMugshotMessageTypeasclosed{messageId:int32,authorId:int32,/mestamp:date/me,inResponseTo:int32?,senderLoca/on:point?,tags:{{string}},message:string};

ASTERIXDataModel(ADM)

29

createdatasetMugshotUsers(MugshotUserType)primarykeyid;createdatasetMugshotMessages(MugshotMessageType)primarykeymessageId;

Highlightsinclude:•  JSON++baseddatamodel•  Richtypesupport(spa/al,temporal,…)•  Records,lists,bags•  Openvs.closedtypes

createtypeEmploymentTypeasopen{organiza/onName:string,startDate:date,endDate:date?};

30

{"id":1,"alias":"Margarita","name":"MargaritaStoddard","address":{"street":"234ThomasAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date/me("2012-08-20T10:10:00"),"friendIds":{{2,3,6,10}},"employment":[{"organiza/onName":"Codetechno","startDate":date("2006-08-06")}]},{"id":2,"alias":"Isbel","name":"IsbelDull","address":{"street":"345JamesAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date/me("2011-01-22T10:10:00"),"friendIds":{{1,4}},"employment":[{"organiza/onName":"Hexviafind","startDate":date("2010-04-27")}]},{"id":3,"alias":"Emory","name":"EmoryUnk","address":{"street":"456JoseAve","city":"SanHugo","zip":"98765","state":"CA","country":"USA"},"userSince":date/me("2012-07-10T10:10:00"),"friendIds":{{1,5,8,9}},"employment":[{"organiza/onName":"geomedia","startDate":date("2010-06-17"),"endDate":date("2010-01-26")}]}...

Ex:MugshotUsersData

6/2/17

16

createindexmsUserSinceIdxonMugshotUsers(userSince);createindexmsTimestampIdxonMugshotMessages(/mestamp);createindexmsAuthorIdxonMugshotMessages(authorId)typebtree;createindexmsSenderLocIndexonMugshotMessages(senderLoca/on)typertree;createindexmsMessageIdxonMugshotMessages(message)typekeyword;

//---------------------andalso------------------------------------------------------------------------------------

createtypeAccessLogTypeasclosed{ip:string,/me:string,user:string,verb:string,`path`:string,stat:int32,size:int32};createexternaldatasetAccessLog(AccessLogType)usinglocalfs(("path"="{hostname}://{path}"),("format"="delimited-text"),("delimiter"="|"));

createfeedmySocketFeedusingsocket_adaptor(("sockets"="{address}:{port}"),("addressType"="IP"),("type-name"="MugshotMessageType"),("format"="adm"));connectfeedmySocketFeedtodatasetMugshotMessages;

OtherDDLFeatures

31

Externaldatahighlights:•  Equalopportunityaccess•  “Keepeverything!”•  Datainges/on,notstreams

ASTERIXQueries(SQL++orAQL)•  Q1:Listtheusernameandmessagessentbythoseusers

whojoinedtheMugshotsocialnetworkinacertain/mewindow:

selectuser.nameasuname,(selectvaluemsg.messagefromMugshotMessagesmsgwheremsg.authorId=user.id)asmessagesfromMugshotUsersuserwhereuser.userSince>=date/me('2010-07-22T00:00:00')anduser.userSince<=date/me('2012-07-29T23:59:59');

32

{"uname":"IsbelDull","messages":["likesamsungtheplanisamazing”,"liket-mobileitspla{ormismind-blowing"]}{"uname":"EmoryUnk","messages":["lovesprintitsshortcut-menuisawesome:)",...]}

6/2/17

17

SQL++(cont.)

33

•  Q2:Iden/fyac/veusersandgroup/countthembycountry:

withendTimeascurrent_date/me(),startTimeasendTime-dura/on("P30D")selectuser.address.countryascountry,count(users)asac/veUsersfromMugshotUsersuserwheresomelogrecinAccessLogsaLsfiesuser.alias=logrec.useranddate/me(logrec./me)>=startTimeanddate/me(logrec./me)<=endTimegroupbyuser.address.country; SQL++highlights:

•  Lotsofotherfeatures(seewebsite!)•  Spa/alpredicatesandaggrega/on•  Set-similarity(“fuzzy”)matching

UpdatesandTransac/ons

34

•  Key-valuestore-liketransac/ons(w/recordlevelatomicity)

•  Insert,delete,andupsertops;index-consistent

•  2PLconcurrency•  WALno-steal,

no-forcewithLSMshadowing

•  Q3:AddanewusertoMugshot.com:

insertintoMugshotUsers({"id":11,"alias":"John","name":"JohnDoe","userSince":date/me("2012-08-20T10:10:00.000Z"),"address":{"street":"789JaneSt","city":"SanHarry","state":"CA","zip":"98767","country":"USA"},"friendIds":{{5,9,11}},"employment":[{"organiza/onName":"Kongreen","startDate":date("2009-08-11")}]});

6/2/17

18

AsterixDBClusterOverview

3535

Data Loads and Feeds

AQL queries and results

Data publishing

Cluster Controller

MD Node Controller

Node Controller

Node Controller! ! !

Aste

rixD

B

ASTERIXSoywareStack

36

Hivesterix Apache VXQuery

Algebricks Algebra Layer M/R LayerPregelix

Hyracks Data-Parallel Platform

Hyracks Job

HadoopM/R JobPregel Job

AQL HiveQL XQuery

AsterixDB

6/2/17

19

APeekatPerformance

37

APeekatPerformance(cont.)

#AsterixDB38

6/2/17

20

•  Poten/alusecaseareasinclude–  Socialdataanaly/cs–  Cellphoneeventanaly/cs–  Behavioralscience–  Educa/on–  Publichealth–  Powerusagemonitoring–  Clustermanagementloganaly/cs–  ....

39

ExampleAsterixDBUseCases

CurrentStatus

•  4yearini/alNSFproject(250+KLOC),started2009•  NowofficiallyApacheAsterixDB!–  Semistructured“NoSQL”styledatamodel–  Declara/veparallelqueries,inserts,deletes,…–  Datastorage/indexing(primary&secondary,LSM-based)–  Internalandexternaldatasetsbothsupported–  Richsetofdatatypes(includingtext,/me,loca/on)–  Fuzzyandspa/alqueryprocessing–  NoSQL-liketransac/ons(forinserts/deletes)–  Datafeedsandindexesforexternaldatasets–  ....

40

6/2/17

21

ForMoreInforma/on

•  AsterixprojectUCI/UCRresearchhome–  hzp://asterix.ics.uci.edu/

•  ApacheAsterixDBhome–  hzp://asterixdb.apache.org/

•  SQL++Primer–  hzp://asterixdb.apache.org/docs/0.9.1/index.html

•  NavigatefromCS122awiki(HW)togetandinstallit!–  AfewotherresourcesandhintsintheHWmaterials.

QUESTIONS...?41