17
@SnowflakeDB #CloudAnalytics17 LONDON

Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

@SnowflakeDB@SnowflakeDB #CloudAnalytics17

LONDON

Page 2: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

BringingYourDataTogetherintheCloudToddBeaucheneGlobalAlliancesArchitect,SnowflakeComputing

Page 3: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

“Data!Data!Data!Ican'tmakebrickswithoutclay.”-SherlockHolmes

Page 4: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Agenda

• CloudDataEcosystem• DataSources• Methodologies• DataIntegrationSolutions• Conclusion

Page 5: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Cloud Data EcosystemData Integration Business Intelligence &

AnalyticsData Warehouse

Enterprise apps

Data Sources

Corporate

Web

Mobile

IoT

Page 6: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Data Sources

Page 7: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Data SourcesOn-Premises• Typicallybackedbyalocaltransactionaldatabase

• Alldataliveswithinthefirewall

• Customerhasfullaccesstoalldataandsystem

Cloud• Typicallybackedbyaclouddatabase(i.e.RDS)

• CanrunincustomerVPC

• Typicallyoffersfeweroptionsthanon-premises

SaaS• Typically data is only

available via API• Outside of customer

firewall or VPC• Customer has very

little control over handling of data

Page 8: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Real World Example: Consolidated DashboardChallenges• Long-termprojectwithhigh-levelgoals

• Diversedatasources

• Differentrefreshcycles

• Inconsistentresults

Solutions• Agileprojectwithfocused,short-termgoals

• DedicatedschemainEDW

• DailyETLProcess

• DataqualitycheckswithinETL

Page 9: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Methodologies

Page 10: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

MethodologiesBulkLoading– Trunc andLoad• Runsatregularintervals• Fulldatasetloadedduringeachrunandexistingdataispurged

• Leastefficientoption,butverysimpletomanage

• Highdatavolumeseveryrun• Morecommonlyusedfordimensiontables

DailyDifferentials• RunsduringnightlyETLwindow• Requireschangedatacapturetoidentifychangedrows

• Generallyconsistsofaseriesofstepswhereeachdependsontheprevioussteps

• Mustincludelogictohandleslowlychangingdimensions

Page 11: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

MethodologiesInsert-only– Date-based• ExtractsdatabydaterangetoeliminateneedforCDC

• Simplifiedprocessing• Commonlyusedforfacttables• Changestodatafrompreviousperiodsrequiredeletionofalldataforthegivenrange

DatabaseReplication• Generallyrunsinnear-real-time• Requiresatoolthatistightlyintegratedwiththesourcedatabase

• Schemasmustmatchbetweensourceanddestination

Page 12: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

MethodologiesBatchProcessing• Generallyusedwhendataisbeingpushedfromthesource

• Batchfrequencydependsonthevolumeandvelocityofthedata

• Requiresautomatedprocesstoloadbatchesintothedatawarehouse.

Streaming• Generallyusedforhighvolumedata

• Event-basedratherthanrow-based

• Oftenrequiresmicro-batchingofdataforloadintorelationaldatabase

• Rawdatamustusuallybetransformedtosupportanalytics

Page 13: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Data IntegrationSolutions

Page 14: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Data Integration SolutionsCustomCode• Flexiblebutcomplex

• Leveragesin-databaseprocessing

• Challengingtomanageandmaintain

ETL• Simplifieddatatransformationwithnocode

• Built-independencyanderrorhandling

• ReducesdatavolumeswithinEDW

ELT• Leverages benefits of

ETL while shifting data processing to EDW

• Requires tight integration between Data Integration and EDW

• Raw and transformed data in one place

Page 15: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Data Integration SolutionsOn-Premises• Customerownshardwareandsoftwareinstall/configuration

• Don’thavetodealwithfirewalltoaccesslocalsources

Cloud• Customerownssoftwareinstall/configurationbutnothardware

• CanrunincustomerVPCtoprovidedirectaccesstodatawithinVPCorbehindfirewall

SaaS• Fully managed by

service provider• Configurable options

vary by solution• Must find secure ways

to access data not stored inside firewall

Page 16: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Conclusion

Page 17: Todd - London 2 - Brining You Data Together in the Cloud · Todd - London 2 - Brining You Data Together in the Cloud Created Date: 6/13/2017 7:17:36 PM

Cloud Data Warehousing Best Practices• Leveragethescalablecomputelayertodothebulkofthedata

processing• Isolateloadandtransformjobsfromqueriestopreventresource

contention• Eliminatephysicaldatamartsbyleveragingascalabledataplatform• QAiskey,makesureallchangesmadetodataintegrationtasksare

testedbeforetheyrolltoproduction• Whenmigratingitisimportanttoconvertonesourceatatime