Transcript
Page 1: Delivering a 'Big Data Ready' minimum viable product

DELIVERINGA'BIGDATAREADY'

MVP

GregoryChomatas

DublinGoogleDevelopersGroup-2013July30th

Page 2: Delivering a 'Big Data Ready' minimum viable product

http://linkedin.com/in/gchomatas

SWEngineer

CHOMATASGREGORY

t:@gchomatas

http://www.astroboa.org

Entrepreneur

Betaconcept/Astroboa:Founder

Aquinetix:Co-founder/CTO

Page 3: Delivering a 'Big Data Ready' minimum viable product
Page 4: Delivering a 'Big Data Ready' minimum viable product
Page 5: Delivering a 'Big Data Ready' minimum viable product
Page 6: Delivering a 'Big Data Ready' minimum viable product

7YEARSAGOIREALIZED...

Page 7: Delivering a 'Big Data Ready' minimum viable product

TOOMUCHRDBMSSODA

Page 8: Delivering a 'Big Data Ready' minimum viable product

LOTSOFOBJECT-RELATIONALMISMATCH

Page 9: Delivering a 'Big Data Ready' minimum viable product

DBISNOTTHECENTEROFMYAPPLICATION

DomainDrivenDesign/BehaviourDrivenDesign

DatabaseDrivenDesignvs

Page 10: Delivering a 'Big Data Ready' minimum viable product

ATTHATTIMENOTMANYALTERNATIVESEXISTED

sowedecidedtorollourowndatastoresolution...

Page 11: Delivering a 'Big Data Ready' minimum viable product

ASTROBOATOTHERESCUEHybridDocument-GraphStorefocusedondatasemantics

SimilartoGoogleDatastore&OrientDB

External'appindependent'SemanticDataModelModelasyougoSecurityperEntityinstance/propertyVersionedEntitiesAutomatedRESTAPIsencapsulatingthedatalayerHyperlinkedResourcesPolyglotPersistence(Experimental)*

*Notavailableinthepublicversion

Page 12: Delivering a 'Big Data Ready' minimum viable product

THE"BIGNESS'INBIGDATATwomainpathstotherealizationof'BIGNESS'

Luckilybothpathsconvergetocommonprinciples&toolsthatcan

manageBIGComplexity&BIGVolume

Page 13: Delivering a 'Big Data Ready' minimum viable product

BIGDATAENLIGHTENMENT

Page 14: Delivering a 'Big Data Ready' minimum viable product

BIG'DATAPROBLEMS'(COMPLEXITY)singlepointoffailure/resiliencecrossdatacenterhumanfaulttolerancestore/searchunstructuredorsemi-structureddataflexibledatamodeling(e.g.traverserelationships)dataversioningpolyglotprogrammingmultitenancyshare/dataasaservicesemanticweb/multipleformats-endpoints

FlexibleOptions/Easeofoperations

Page 15: Delivering a 'Big Data Ready' minimum viable product

'BIGDATA'PROBLEMS(VOLUME)highvolumehighvelocityreal-timeAPIs/actinrealtimedataasothersservice/dirtydatafromopensourceslogcollection/aggregation

LINEAR/HORIZONTALSCALING

Page 16: Delivering a 'Big Data Ready' minimum viable product

IAMNOTABIGDATASTART-UP!Start-up=Growth(5%-10%)/week1000writesperaquaculturefarmperday120farmsonpublicbeta=120000writes/day1stmonth:176farms=176000writes/day6thmonth:1181farms=1.2Mwrites/day1styear:17045farms=17Mwrites/day(200/sec)2ndyear:2421143=2.4Bwrites/day(27777/sec)

AREYOUSURE?asucessfulSaaSisabigdataservice

Page 17: Delivering a 'Big Data Ready' minimum viable product

IT'SJUSTANMVP-WEWILLADDALLTHESEBIGDATASTUFFLATER

ABigDataarchitecturecanbesimplerthanatraditionaloneTherightdatastorecanincreaseproductivityKeepitsimplebutnotcompromisethearchitecturalconceptsBalancebetweentechnicaldebt&technicalequityAnenterprisebusinesssystemwillusuallywinonunderlyingtechnologicalinnovation,robustnessandenterprisereadiness"Inbusinessthereisnothingmorevaluablethanatechnicaladvantageyourcompetitorsdon'tunderstand"-PaulGraham

Page 18: Delivering a 'Big Data Ready' minimum viable product

KEYBIGDATAARCHITECTUREFEATURESDistributedStorage

APPLICATIONdatabasevsINTEGRATIONdatabaseMixseveraldatamodels/polyglotpersistenceExternalDataSchema/CommonDataStructuresDataStoreencapsulatedbyanAPI(DataServices)Appendonly/savechangesvsstate(eventsourcing)

Page 19: Delivering a 'Big Data Ready' minimum viable product

KEYBIGDATAARCHITECTUREFEATURESDistributedComputing

AsynchronousprocessingRealTimeEventProcessing/StreamingSimpledecoupledservicesexposedthroughRESTorRPCAPIs(businessservices)Thickwebclients/mob.appsusingtheRESTorStreamingAPIsClient-levelmultivariatedataanalysis&complexvisualization

Page 20: Delivering a 'Big Data Ready' minimum viable product

THELAMBDAARCHITECTUREbyNathanMarzandJamesWarren

storeraw,immutable,perpetualdata

query=function(alldata)

combinebatch&realtimestreamprocessingtocomputearbitraryfunctionsonarbitrarydata

Page 21: Delivering a 'Big Data Ready' minimum viable product

THELAMBDAARCHITECTURE

Page 22: Delivering a 'Big Data Ready' minimum viable product

ULTIMATEDESIGNRULE

KEEPitSIMPLE

Page 23: Delivering a 'Big Data Ready' minimum viable product

THECONVENTIONALARCHITECTURE

auto-shard

newdatastorecriteria

Distributed

Easytochangeschema&queries

Simpletoinstall,configure,operateonecomponent

peer-to-peer

Minimizeimpedancemismatch

Boostproductivity

Page 24: Delivering a 'Big Data Ready' minimum viable product

DIRECTLYSTOREMYAGGREGATES{"date":"2013-02-28","allocated_worker":"swp4jhi4Tm6VxY1nueX2yw","cage":"1GuuHWTaQc-kpPcRV5uBGA","feed":"7IWmy2FATcS9Vh0RB1onXQ","quantity_approved":12.5,"farm":"__uBZUr3RWOqOSkszfbRLw","species":"KDU-2LCjRRynby9HLifc3g","batch":"i6MgxixnSCGwGWb0037wlQ","execution":{"feeder":"swp4jhi4Tm6VxY1nueX2yw","quantity_fed":12.5,"species_position_start":"top","species_position_end":"middle","start":"2013-02-28T07:59:57.668Z","end":"2013-02-28T08:00:03.216Z","feeder_position_end":{"lat_lon":{"lat":37.7066959,"lon":23.16831896},"altitude":40,"accuracy":12}}}

Page 25: Delivering a 'Big Data Ready' minimum viable product

THECANDIDATESKey-Value Document Column GraphRiak MongoDB Cassandra Neo4JRedis CouchBase HBase InfiniteGraphPr.Voldemort OrientDB Hypertable OrientDBMemcacheDB ElasticSearch Accumulo TitanDynamoDB GoogleDatastore SimpleDB Virtuoso

Page 26: Delivering a 'Big Data Ready' minimum viable product

MYCOOLDATASTORETIPelasticsearchdocumentstore

NootherNoSQLstorecomesclosetotheoutoftheboxutilityandusabilityofElasticSearch

schemaless,multitenant,replicating&shardingdocumentstorethatimplementsextensible

&advancedsearchfeatures(geospatial,faceting,filtering,etc.)

RESTAPItoCREATE/UPDATE(partially)/DELETE/READaggregates/entities

RESTSearchAPIwithfulltextsearchoutofthebox

MULTI-TENANTfriendlywithRESTAPIforcreating/updatingDBs&entitytypes

Dynamic/Semi-Dynamic/Fixedschema

Page 27: Delivering a 'Big Data Ready' minimum viable product

ELASTICSEARCHPOWERindexover95GB/h/node

8-nodecluster:sub-200msresponseforcomplexsearcheson10B+records

(oracleORmysql)ANDreplicationappleANDip*djohnANDcity:Dublinspecies:"SeaBream"ANDexecution.date:[20130701TO20130730]taxicubAND("Dublin"^2OR"Cork")

"facets":{"locations":{"terms":{"field":"city"}}}

"terms":[{"term":"Dublin","count":130},{"term":"Cork","count":20},{"term":"Galway","count":1}]

Page 28: Delivering a 'Big Data Ready' minimum viable product

FACETEDBROWSING

Page 29: Delivering a 'Big Data Ready' minimum viable product

HISTOGRAMS/GEODISTANCE"facets":{"Feed_Histogram":{"date_histogram":{"key_field":"date","value_field":"execution.quantity_fed","interval":"month"}}}

"filter":{"geo_distance_range":{"from":"200km","to":"400km""pin.location":{"lat":40,"lon":-70}}}

"filter":{"geo_polygon":{"person.location":{"points":[{"lat":40,"lon":-70},{"lat":30,"lon":-80},{"lat":20,"lon":-90}]}}}

"filter":{"geo_distance":{"distance":"200km","pin.location":{"lat":40,"lon":-70}}}

Page 30: Delivering a 'Big Data Ready' minimum viable product

RDBMSOUT-DOCUMENTSTOREIN

Page 31: Delivering a 'Big Data Ready' minimum viable product

WHATABOUTMYRELATIONS

Page 32: Delivering a 'Big Data Ready' minimum viable product

LETSGOPOLYGLOT

Page 33: Delivering a 'Big Data Ready' minimum viable product

THETITANGRAPHDBDistributedPluggablestorage(Cassandra,HBase,BerkeleyDB)IndexingwithElasticSearch&LuceneBlueprintsInterfaceGremlinQueryLanguageRexterServeraddsJSON-basedRESTinterface

Page 34: Delivering a 'Big Data Ready' minimum viable product

EASYGRAPHTRAVERSALWITHGREMLIN//calculatebasiccollaborativefilteringforuser'Gregory'

m=[:]

g.v('name','Gregory').out('likes').in('likes').out('likes').groupCount(m)m.sort{-it.value}

Page 35: Delivering a 'Big Data Ready' minimum viable product

STARTONASINGLEMACHINE

Page 36: Delivering a 'Big Data Ready' minimum viable product

DATASTORESELECTIONTIPS(1)UsepolyglotpersistencewithmultipledatamodelsStartwithaDocumentStoreasyoursystemofrecordMixitwithakey-valueStoreforkeepingsessions,shoppingcart,userprefs,counters,cachingMixitwithaGraphstoretokeepandtraverseentityrelationshipsUseaColumnStoreasyoursystemofrecordifyouneedperformanceratherthanflexibilityandyouknowwellyourdatamodel&queriesKeeparelationaldbforqueriesontransientdata(reportingoninter-aggregaterelationships)

Page 37: Delivering a 'Big Data Ready' minimum viable product

DATASTORESELECTIONTIPS(2)Preferone-componentstoresratherthanmanymovingpartsChooseastorethatmakesiteasytoexperimentwithschemaandquerychanges&supportseasydatamigrationsPreferstoresthatcanworkwithbothdynamic&fixedschemas(thereisalwaysanimplicitschema)InearlyprototypesavoidColumnstoresastheyhaveahighcostonschemaandquerychanges

Page 38: Delivering a 'Big Data Ready' minimum viable product

DATASTORESELECTIONTIPS(3)Choosestoresthatsupportauto-shardingPreferpeer-to-peerreplicationratherthanmaster-slaveReplicationfactorN=3isagoodstandardchoiceConsistencyAdjustmentQuorum:W>N/2,W+R>N

Page 39: Delivering a 'Big Data Ready' minimum viable product

ALLTHATSAID...APPCONTEXTisalwaysthedeterminingfactorforselecting

yourstore

aswellas...

Safety/StabilityProductivityCommunity

PerformanceTooling/Operationeaseness

Page 40: Delivering a 'Big Data Ready' minimum viable product

DATAMODELINGTIPSRememberthatyoufityourmodeltothedatastoreandnotViceVersa(APPLICATIONvsINTEGRATIONDB)UseaSchemaBuildyouraggregatesorcolumnfamiliesaccordingtoyourusecases,i.e.DENORMALIZEperyourqueryrequirementsAggregatesformtheboundariesforACIDoperations(transactions)Pre-computeQuestionFocusedDatasets(materializedviews)toprovidedataorganizeddifferentlyfromtheirprimaryaggregates

Page 41: Delivering a 'Big Data Ready' minimum viable product

AREWEFINISHEDYET?NOTQUITE!

Dosomethingwithourmonolithicapp

Page 42: Delivering a 'Big Data Ready' minimum viable product

SPLITTHEMONOLITHICAPPLICATIONWrapdatastoresintoDATASERVICESCreateBUSINESSSERVICESontopofDataServicesPreferRESTfulAPIsforservices(ROA)UseaBinarySerializationFrameworktocreateRPCAPIsifperformanceisaconcern(ROA/SOA)MoveMVC*tofatmobile/webclientappsthatconsumetheAPIs

JavaScriptinthebrowserisoneoftheworld'smostwidelydistributedexecutionenvironments&Deploymentistrivial!

Page 43: Delivering a 'Big Data Ready' minimum viable product

DECOUPLEDSERVICES

FATCLIENT

SINGLEPAGEAPP

Page 44: Delivering a 'Big Data Ready' minimum viable product

APIFRAMEWORK/DSLclassAPI<Grape::APIversion'v1',:using=>:header,:vendor=>'aquinetix.com'default_format:jsoncontent_type:json,"application/json"content_type:tsv,"text/tab-separated-values"formatter:tsv,Aquinetix::TsvFormattercontent_type:kml,"text/xml"formatter:kml,Aquinetix::KmlFormattermountCageAPImountCageEventsAPImountDeviceAPImountFeedAPImountFeedingAPImountLossCountEventAPImountOxygenSamplingEventAPImountSigninAPImountTemperatureSamplingEventAPImountUserAPIadd_swagger_documentationmarkdown:true,base_path:"http://..."end

Page 45: Delivering a 'Big Data Ready' minimum viable product

APIFRAMEWORK/DSLclassFeedingAPI<Grape::APIresource:feedingsdodesc'Createanewfeeding'postdoexecute_farm_obj_create_request'Feeding'enddesc'PerformaFULLorPARTIALupdateofanexistingfeeding'paramsdorequires:id,type:String,desc:"Theid(UUID)of..."optional:fields,type:String,desc:"Whichfields..."endput'/:id'doexecute_farm_obj_update_request'Feeding'enddesc'Getafeedingbyitsid(UUID)'paramsdorequires:id,:type=>String,:desc=>"Feedingid."endget'/:id'doexecute_farm_obj_instance_get_request'Feeding'endendend

Page 46: Delivering a 'Big Data Ready' minimum viable product

SWAGGERUI

Page 47: Delivering a 'Big Data Ready' minimum viable product

MVC*ATTHECLIENTMobileappwithbackbone.js&phonegapManagement/BIConsolewithAngularJSVisualizationwithD3.jsMultivariateDatasetAnalysisatthebrowserwithcrossfilter.jsAppworkflow&buildwithyeoman,grunt,bower

*MVP,MVVM,MVC,MVW

Page 48: Delivering a 'Big Data Ready' minimum viable product

ASYNCHRONOUS/REALTIMEPROCESSING&STREAMINGAPI

RabbitMQ+RabbitMQWeb-StompPluginattheserver

SockJS,Stompjslibsattheclient

Real-timeeventstreamprocessingwithESPER

Alternativemessagebrokers:

node.js+zeromq

kestrel

pusher

kafka(>100kmsg/sec)

AlternativeReal-timestreamprocessing:Storm

Page 49: Delivering a 'Big Data Ready' minimum viable product

USECASEScountratings,votes,click-throughs

blockabusivecrawlersrate-limitapis

detectspammingattemptstrackperformanceandtriggeralerts

batchprocesslogs

Page 50: Delivering a 'Big Data Ready' minimum viable product

SUBSCRIBETOSTOMPTOPICSFROMJSws=newSockJS('http://node1.aquinetix.com:15674/stomp')@client=Stomp.over(ws)@client.connect('aquinetix','password',(x)=>@on_connect(x)@on_error,"/")on_connect:(x)->console.log"Connectedtomessagebroker"@[email protected]'/topic/feeding',(message)=>feeding=JSON.parse(message.body)Aq_Manager.events.trigger'feeding_execution:arrived',feeding

@[email protected]'/topic/position',(message)=>position=JSON.parse(message.body);Aq_Manager.events.trigger'worker_position:arrived',position

@client.send('/topic/feeding',{},JSON.stringify(feeding_obj))

Page 51: Delivering a 'Big Data Ready' minimum viable product

REALTIMEEVENTPROCESSINGWITHESPERselectcount(*)astps,max(retweetCount)asmaxRetweetsfromTwitterEvent.win:time_batch(1sec)

selectfraud.accountNumberasaccntNum,fraud.warningaswarn,withdraw.amountasamount,MAX(fraud.timestamp,withdraw.timestamp)astimestamp,'withdrawlFraud'asdescfromFraudWarningEvent.win:time(30min)asfraud,WithdrawalEvent.win:time(30sec)aswithdrawwherefraud.accountNumber=withdraw.accountNumber

Page 52: Delivering a 'Big Data Ready' minimum viable product

LOGACTIVITYANDOPERATIONALDATA

Todayacriticalpartoftheproductionfeatures

ofwebsites

Logstash+ElasticSearch+Kibana3

Page 53: Delivering a 'Big Data Ready' minimum viable product

WRAPUPShouldavailability,robustness&scalabilitybeaddedtoyourhypotheses&valueproposition

?

ifYESthen:

Adoptanarchitecturewithdecoupledanddistributedcomponentsatearlystages.Buildyour

teamaroundit&balancetechnicaldebt/equitytoget:

Increasedteamproductivity,Increasedreadinessandagility,Sustainability

Buildyourdatamodelsaroundyourusecasesratherthanaroundyourdatabase

andexperimentwithapolyglotpersistencestrategy

Startwiththemosteasytoinstall,configure&operatetechnologies.

KeepitSIMPLE&SUSTAINABLE

Page 54: Delivering a 'Big Data Ready' minimum viable product

LINKS/REFERENCES

http://www.rabbitmq.com/web-stomp.html

https://github.com/jmesnil/stomp-websocket/

IntroductiontoNoSQL-MartinFowlergoto;conference

MartinFowleratNoSQLMattersconference

BookontheLambdaArchitecture

TalkonLambdaArchitecture

WilliamPietri-GoingtheDistance:BuildingaSustainableStartup

Don'tLettheMinimumWinOvertheViable-HarvardBusinessReview

ElasticSearchDocumentDB&SearchEngine

CassandraColumnDB

TitanGraphDB

AstroboaSemanticDocumentStore

Page 55: Delivering a 'Big Data Ready' minimum viable product

LINKS/REFERENCEShttps://github.com/sockjs/sockjs-client

https://github.com/robey/kestrel

https://github.com/JustinTulloss/zeromq.node

http://kafka.apache.org/index.html

https://github.com/nathanmarz/storm

https://developers.helloreverb.com/swagger/

https://github.com/wordnik/swagger-ui