Upload
hortonworks
View
757
Download
2
Embed Size (px)
Citation preview
1 ©HortonworksInc.2011–2017.AllRightsReserved
EnterpriseDataWarehouseOptimization
PietLoubserVPProductandSolutionsMarketing
Hortonworks
Dr BarryDevlinFounder&Principal9sightConsulting
Copyright© 20179sightConsulting, AllRightsReserved
DrBarryDevlin
Founder&Principal9sightConsulting
TheEDWLivesOn
TheBeatingHeartoftheDataLake
10August2017
HortonworksWebinar
Dr.BarryDevlin
3 Copyright©2017,9sightConsulting
FounderandPrincipal9sightConsulting,www.9sight.com
Dr. BarryDevlinisafounderofthedatawarehousingindustry,definingitsfirstarchitecturein1985.Aforemostauthorityonbusinessintelligence(BI),bigdataandbeyond,heisrespectedworldwideasavisionaryandthought-leaderintheevolvingindustry.Barryhasauthoredtwoground-breakingbooks:theclassic"DataWarehouse--fromArchitecturetoImplementation"and“BusinessunIntelligence--InsightandInnovationBeyondAnalyticsandBigData”(http://bit.ly/BunI_Book)in2013.
Barryhasover30yearsofexperienceintheITindustry,previouslywithIBM,asaconsultant,manageranddistinguishedengineer.Asfounderandprincipalof9sightin2008,Barryprovidesstrategicconsultingandthought-leadershiptobuyersandvendorsofBIandBigDatasolutions.HeisanassociateeditorofTDWI'sJournalofBusinessIntelligence,andaregularkeynotespeaker,teacherandwriteronallaspectsofinformationcreationanduse.
BarryoperatesworldwidefromCapeTown,SouthAfrica.
Email:[email protected]
Twitter:@BarryDevlin
4 Copyright©2017,9sightConsulting
Agenda
1. Past– fromawarehousetoalake
2. Present– awarehouseand alake
3. Emerging– awarehousebyalake
4. Conclusions
Thedataarchitecturesincethemid-’80s
§ TwolayerswithintheDataWarehouse…– Enterprisedatawarehouse
– Reconcileddata– Datamarts
–Whattheusersneed
§ …fedfromandseparatetooperationalsystems– Datatorunthebusiness– Createdbytheprocessesofthebusiness
§ Alldatacreatedwithintheenterprise(orwithinpartnerecosystem)
5 Copyright©2017,9sightConsulting
Datamarts
Enterprisedatawarehouse
Metadata
Datawarehouse
Operationalsystems
“Anarchitectureforabusinessandinformation system”,B.A.Devlin, P.T.Murphy,IBMSystems Journal, (1988)
Thedrivetowardthedatalakesince2010
§ Datawarehousearchitecture“old-fashioned”– Linkedto(traditional)relationaldatabases– Toostructured,schema-on-write– Tooslow/complextobuild– Lackingsupportforbigdata– NolinktoHadoop
§ Datalakeproposedasalternative– Cheaper,biggerandmoreflexible– Structure-agnostic,schema-on-read(latebinding)
– Supportsalldatatypes– Agile,flexible,rapidimplementation– DrivenbyHadoopecosystem– Datareservoir– abetter(?)architecteddatalake
6 Copyright©2017,9sightConsulting
Data warehouse
Image:GartnerviaBillSchmarzo,infocus.emc.com/william_schmarzo/data-lake-data-reservoir-data-dumpblah-blah-blah/(2014)
Datalakearchitecture
7 Copyright©2017,9sightConsulting
www.capgemini.com/blog/capping-it-off/2014/08/you-have-to-manage-your-data-lake-the-fallacy-of-technology-being-magic
FromBItoBusinessunIntelligence
§ Peopleprocessinformation
§ People:Rationalthoughtandfarbeyond– Peoplemakealldecisions!
§ Process:Logic– predefined,emergent– Decisionmakingisaprocess
§ Information:Data,knowledge,meaning– Data/informationisonlythefoundation
§ Notbusinessintelligence…BusinessunIntelligence§ Amazon:http://bit.ly/BunI_Book
§ Orhttp://bit.ly/BunI-TP2:25%discountwithcode“BIInsights25”
8 Copyright©2017,9sightConsulting
Information
Process
People
BusinessunIntelligence– Informationpillars
§ Onearchitectureforalltypesofinformation– Mix/matchtechnologyasneeded
– Relational,NoSQL,Hadoop,etc.
§ Integrationofsourcesandstores– Instantiationgathersinputs– Assimilationintegratesstoredinfo.
§ Dataflowsasfastasneededandreconciledwhennecessary– Nounnecessarystorageortransformations
§ Distinctdatamanagement/governanceapproachesasrequired
9 Copyright©2017,9sightConsulting
Transactions
Human-sourced
(information)
Machine-generated
(data)
Process-mediated(data)
Context-setting(information)
Assimilation
Transactional(data)
EventsMeasures Messages
Instantiation
Positioningofdatalakeandwarehousetoday
§ Servedifferentpurposes– Functional– run/managethebusiness– Illustrative– predict/influencethefuture
§ Bothrequired– Optimizedfordifferentstrengths– Warehouse=accuracyandconsistency– Lake=timelinessandrawness
§ Linksbetweenenvironments– Betterthancopyingeverythingintoone(orboth)
§ Together– foundationforpervasiveanalytics
10 Copyright©2017,9sightConsulting
Events Measures Messages
Datawarehouse
FunctionalAccurate, consistentdata
DiscardedifoutdatedLegallybinding,
traceableprocess
Transactions
DataLake
IllustrativeTimely,rawdataStoredforeverCreative,free-flowingprocess
Operationalsystems
Useraccesstoall data
Awarehousebyalake(1)Preparationandenrichment
§ Challenge:ETL(extract,transform andload)todatawarehousecomplexandcomputationallyexpensive
§ Transformin:– ProprietaryETLserver– withhighlicensingcost
– Datawarehouseserver– withimpactonanalytictasks
§ Solution:Pumpsomeoralldatathroughthedatalake– Reducedprocessingcostand/orimpactonDWwork
11 Copyright©2017,9sightConsulting
Datawarehouse
Transactions
Op.systems
Events Measures Messages
DataLake
Useraccesstoall data
Awarehousebyalake(2)Archival
§ Challenge:Storingseldom-used(cold)datainadatawarehouseisanexpensivewasteofhigh-performancehardware
§ Archivingtomagnetictapedelaysandcomplicatesaccesstooff-linedatawhenneeded
§ Solution:archivetocommodityserversanddisksindatalake– Hadoop– nolicensingcosts– Fasteraccesswhenneeded–almostequaltoDW
– Sametools(SQL-based)foraccessasDW
12 Copyright©2017,9sightConsulting
Datawarehouse
Transactions
Op.systems
Events Measures Messages
DataLake
Useraccesstoall data
Awarehousebyalake(3)Access
§ Challenge:Dataincreasinglyresidesondisparateplatforms– Traditionalbusinessinfoinrelational– BusinesspeoplefamiliarwithSQL– Socialmedia,IoTonHadoop/NoSQL/etc.
– Copyingbackandforthisexpensive
§ Solution:Virtualizeaccesstodataonallplatforms– SQL-basedqueries– Joindataacrossplatforms
13 Copyright©2017,9sightConsulting
Datawarehouse
Transactions
Op.systems
Events Measures Messages
DataLake
Useraccesstoall data
Conclusions
1. Enterprisedatawarehouseliveson– Focusedoncorebusinessinformation– Traditionalrelationalplatformsstillpreferred
14 Copyright©2017,9sightConsulting
2. Datalakecomplementsdatawarehouse– Focusedonexternallysourceddata– Linkedtodatawarehouseinmultipleways
3. Datalakecanassist/offloaddatawarehouse– Usecommoditystorageandprocessingpower– Reducecostsandimproveperformance
Copyright© 20179sightConsulting, AllRightsReserved
DrBarryDevlin
Founder&Principal9sightConsulting
ThankYou
PietLoubserVPProductandSolutionsMarketingHortonworks
17 ©HortonworksInc.2011–2017.AllRightsReserved
TheNewWayofBusinessIsFueledByConnectedData
• ConnectedCustomers,Vehicles,Devices• Sociallycrowd-sourcedrequirements• Digitaldesignandanalysis• Digitalprototypesandtests(simulations)
• ConnectedFactories,Sensors,Devices• Human-roboticinteraction• 3D-printingondemand
• ConnectedTrucks,Inventory• Location,traffic,weather-awaredistribution• Real-timeinventoryvisibility• Dynamicrerouting
• ConnectedCustomers,Devices• Omni- channeldemandsensing• Real-TimeRecommendations
• ConnectedAssets• Remoteservicemonitoring&delivery• Predictivemaintenance• OTAUpdates
Development Manufacturing Distribution Marketing/Sales Service
18 ©HortonworksInc.2011–2017.AllRightsReserved
D A TA C E N T E R
EnterpriseDataLake
DataFlow&Stream
Processing
BigDataCloudService
C L O U D BigDataCloudService
AConnectedDataStrategyConnectsDataCenterandCloud
SecurityDataLake
AWSIaaSAzureIaaS
19 ©HortonworksInc.2011–2017.AllRightsReserved
TypicalEDWArchitectureUsedinefficiently, from$7,500to$35,000perTB1 ofdatastoredandprocessed
InatypicalEDW:• 50-70%ofdataisunusedand/orcold• 45-65%ofCPUcapacityisETL/ELT
• 25-35%ofCPUconsumedbyETListoloadunuseddata
• 30-40%ofCPUisconsumedbyonly5%ofETLworkloads
• Aslittleas2.8%ofthedataisHot1
ANAL
YTICS
DataMarts
BusinessAnalytics
Visualization&Dashboards
DATASYSTEMS
SystemsofRecord
RDBMS
ERP
CRM
Other
Source:HortonworksInnovationandStrategyTeamandAppfluent Analysis1.EYAnalysisshowstypicalrangefrom$10-15k/TB.Hortonworksexperienceshowsawiderangeobservedinthefield,from$35k/TBformassive,in-memoryEDWappliancesto$7.5k/TBforRDBMSbased,home-grownEDWsolutions2.Forexample,foraclientkeepingarolling36-monthwindowofdataforreportinginanEDW,only1monthofthe36(2.8%)is new/hot.
20 ©HortonworksInc.2011–2017.AllRightsReserved
HortonworksConnection:ServicesandSolutionsforYourSuccess
DataServices
HortonworksSolutions
EnterpriseDataWarehouseOptimization
CyberSecurityandThreatManagement
InternetofThingsandStreamingAnalytics
DataScienceExperience
AdvancedSQL
DataCenterHortonworks DataSuite
HDFHDP
HortonworksConnection
CloudHortonworks DataCloud
AWS HDInsight
HortonworksConnectionEnablementSubscriptionSmartSense™
PremierOperationalSupportEducationalServicesProfessionalServices
CommunityConnection
21 ©HortonworksInc.2011–2017.AllRightsReserved
EnterpriseDataWarehouseOptimization
DramaticCostReductionsReducecostofyourEDWImplementationbyoffloadingETLprocessesandarchivingcolddata
DeployBusinessIntelligenceonHadoopEmpowerBusinessuserswithpowerfulreporting,newapplications,visualizationtools,andartificialintelligence
SupportMoreTypesofUnstructuredDataIndexandsearchimages,videos,text&soundfiles
22 ©HortonworksInc.2011–2017.AllRightsReserved
EDWPlusHadoophelpsyouoptimizeandreducecostsassociatedwithyourEDE
Archive Cold Data away from EDW• MovecoldorrarelyuseddatatoHadoop
asactivearchive• Storemoreofyourdatalonger,cheaper
Offload costly ETL process• FreeyourEDWtoperformhigh-valuefunctionslike
analytics&reporting,notETL• UseHadoopforadvancedormassive-scaleETL/ELT
ANAL
YTICS
DATASYSTEMS
DataMarts
BusinessAnalytics
Visualization&Dashboards
SystemsofRecord
RDBMS
ERP
CRM
Other
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
ColdData,DeeperArchive&NewSources
EnterpriseDataWarehouse
Hot
DataScience
OLAPonHadoop
23 ©HortonworksInc.2011–2017.AllRightsReserved
EDWOptimization:ETLOffload
à TheProblem:– EDWsconsumebetween50%and90%of
CPUjustonETL/ELTtasks.– Thesejobsinterferewithmorebusiness-
criticaltaskslikeBIandadvancedanalytics.
à TheSolution:– HiveandHDPdeliverETLthatscalesto
petabytes.– SyncsortDMX-hforsimpledrag-and-dropETL
workflows.– Economicalscale-outprocessingon
commodityservers.
à TheResult:– BetterSLAsformission-criticalanalytics.– LimitEDWexpansionorretireoldsystems.
ETL/ELT
DATAMART
DATALANDING&
DEEPARCHIVE
CUBEMART
ENDUSER
APPLICATIONS
APPLICATIONS
APPLICATIONS
ENDUSERSANDAPPS
EDWOPTIMIZATIONSOLUTION
24 ©HortonworksInc.2011–2017.AllRightsReserved
EDWOptimization:ActiveArchive
à TheProblem:– Increasingdatavolumesandcostpressure
forcedatatobearchivedtotape.– Archiveddatanotavailableforanalytics,or
mustberetrievedatgreatexpense.
à TheSolution:– AdoptingHadoopdeliverscostperterabyte
onparwithtapebackupsolutions.– DatainHadoopcanbeanalyzedbyallmajor
BItools,allowinganalyticsonarchivedata.
à TheResult:– Dataalwaysavailableforanalytics.– Storeyearsofdataratherthanmonths.
ETL/ELT
DATAMART
DATALANDING&
DEEPARCHIVE
CUBEMART
ENDUSER
APPLICATIONS
APPLICATIONS
APPLICATIONS
ENDUSERSANDAPPS
EDWOPTIMIZATIONSOLUTION
25 ©HortonworksInc.2011–2017.AllRightsReserved
EDWOptimization:FastBIonHadoop
à TheProblem:– ProprietaryEDWsystemswereadoptedfor
FastBIanddeepslice-and-diceanalytics,butEDWpricesareunsustainablyhigh.
à TheSolution:– InteractiveSQLisarealityonHadooptoday.– PartnerSolutions(IBMBigSQL,Kyvos,Jethro)
addspowerfulSQLandOLAPcapabilitiesfordeepdrilldownatscale.
à TheResult:– Queryterabytesofdatainseconds.– ConnectyourfavoriteBItoolslikeTableauand
ExcelthroughSQLandMDXinterfaces.– TheEDWOptimizationSolutionistailor-made
todeliverFastBIonHadoop.
ETL/ELT
DATAMART
DATALANDING&
DEEPARCHIVE
CUBEMART
ENDUSER
APPLICATIONS
APPLICATIONS
APPLICATIONS
ENDUSERSANDAPPS
EDWOPTIMIZATIONSOLUTION
26 ©HortonworksInc.2011–2017.AllRightsReserved
CentricaTransformsServiceForUtilityCustomers
3MillionCustomers
ETLefficiencygains
300GB/DayIngest
DecommissionedsomeEDWs
canaccess“smartenergyreports”
from11hoursto45minutes/job
rationalizes workoffieldengineers
savingmillionsannually
SITUAT ION
Datafragmentationhidbusiness-widepatterns
fromanalysts
Existinginfrastructuremadeloadingdatadifficult&
causedanalyticbottlenecks
Goal:reducecosts,streamlineprocessesforasingleviewofcustomers
DATADISCOVERY
SmartMeterData
PREDICTIVEANALYTICS
EngineerScheduleOptimization
SINGLEVIEW
CustomerSegmentAnalysis
SINGLEVIEW
ProductCross-Sell
PREDICTIVEANALYTICSTailoredServices
SINGLE V IEWSmartMeterMobileApp
DATAENRICHMENTOn-SiteDataCapture
ACTIVEARCHIVEEDW
Offload
ETLOFFLOADStreamingIngest
“Focusingoninnovation,learningtoforgettraditionallegacywaysofworkingandapproaching itinnewwayscreatesunexpectedbehavioural changes,becausepeoplefeelfreerandtheyalsofeelvalued.”Dajit Rehal,SeniorSystemsDirector
27 ©HortonworksInc.2011–2017.AllRightsReserved
EDWPlusHadoophelpsyoulandandenrichmoredatatorespondfastertonewbusinessrequests
Archive Cold Data away from EDW• MovecoldorrarelyuseddatatoHadoop
asactivearchive• Storemoreofyourdatalonger,cheaper
Offload costly ETL process• FreeyourEDWtoperformhigh-valuefunctionslike
analytics&reporting,notETL• UseHadoopforadvancedormassive-scaleETL/ELT
Land & Enrich more data to create more value-add analytics• UseHadooptoingestnewdatasources,suchasweb
andmachinedatafornewanalyticalcontextfromunstructuredandsemi-structuredsources
• Createananalyticalsandbox foradvanceddatascience
ANAL
YTICS
DATASYSTEMS
DataMarts
BusinessAnalytics
Visualization&Dashboards
SystemsofRecord
RDBMS
ERP
CRM
Other
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
ColdData,DeeperArchive&NewSources
EnterpriseDataWarehouse
Hot
DataScience
OLAPonHadoop
Clickstream Web&Social Geolocation Sensor&Machine
ServerLogs
Unstructured
NEW
SOUR
CES
Ingest Stream Events
28 ©HortonworksInc.2011–2017.AllRightsReserved
PrescientHarnessesMachineLearningforTravelerSafetyWarnings
SITUAT ION
Couldonlyproduceoneassessmentevery3-4days
Performs riskmanagement
Useshumanstoidentify falsepositives
Neededefficientwaytostorerawdataforanalytics
49,500DataSources
700%ProductivityImprovement
5PetabytesofData
HybridArchitecture
ingestedbyHDFintoHDP
forgeospatialanalysts
storedinHDPconnectedEMC
HDFconnectsdatacentertocloud
ETLOFFLOADSensorData
Ingest
DATADISCOVERY
ThreatAssessments
SINGLEVIEWGlobal
ThreatMap
PREDICT IVEANALYT ICSThreat-ProximityMobileAlerts
ACTIVEARCHIVEStreaming
ThreatArchive
DATAENRICHMENTProvenanceMetadata
“Weknowthatwhenwedefineahigh-threatareainagivenareaoftheworld,thatitisunderpinnedbyveryspecificdatasources.It’sdata-driven,andwecanpointtothosesources—ifeverasked—andsay,‘Here’swhy.’”MikeBishop,ChiefSystemsArchitect
29 ©HortonworksInc.2011–2017.AllRightsReserved
WhyHortonworks?
PoweringAllDataData-at-Rest,Data-in-Motion
Cloud,On-PremisesStructured,unstructured
PoweredBy100%OpenSource
RapidinnovationDramaticcostreduction
EnterpriseReadyGovernance
FinegrainedsecurityLineageanddataprovenance
hortonworks.com/get-started/big-data-scorecard/ForresterWave:BigDataWarehouse,Q22017
30 ©HortonworksInc.2011–2017.AllRightsReserved
ThankYou