28
Building the Enterprise Data Lake with Cloudera & Cisco Prepared by : Marilyn Tan, Country Manager Singapore Xue Daming, Senior Systems Engineer

Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

  • Upload
    others

  • View
    40

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

1©Cloudera,Inc.Allrightsreserved.

BuildingtheEnterpriseDataLakewithCloudera&CiscoPreparedby:MarilynTan,CountryManagerSingaporeXueDaming,SeniorSystemsEngineer

Page 2: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

2©Cloudera,Inc.Allrightsreserved.

DigitalTransformationwithData

Page 3: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

3©Cloudera,Inc.Allrightsreserved.

DATAisTransformingtheWorld!

DIGITALDEMOCRACY&SECURITY

- DigitalSharingEconomy:OpenData&Algorithms

- EnterprisereadyOpenSource(e.g.Apache)

- Digital(distributed)Trust(esp.Blockchain)

RISEOFOPENSOURCE

SMARTAPPs

- ConnectedExperience

- Ubiquitouscomputing:Everywhere&oneverydevice(Voice,VR,AR,mobile,Wearables)

THENEWUX

CONNECTEDWORLD

INTERNETOFTHINGSINDUSTRY4.0

- SmartThings/Devices

- NewUseCases

- NewArchitectures

- By2020:20.8bdevices

- IoT:$1.7trillionin2020

MOREDATA

- NewAnalytics

- MachineLearning&AI

- Newdatasources

- DataVirtualization

- DataScience

DATAAS4th PRODUCTIONFACTOR

Page 4: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

4©Cloudera,Inc.Allrightsreserved.

The9yearClouderajourney…

2008CLOUDERAFOUNDEDBYMIKEOLSON,AMRAWADALLAH&JEFFHAMMERBACHER,CHRISTOPHEBISCIGLIAJOINEDBYDOUGCUTTING(2009)

2013CLOUDERAEXPANDSBEYONDMRANDHBASE,INTRODUCINGIMPALA,SOLRANDSPARK

2011CLOUDERAREACHES100PRODUCTIONCUSTOMERS

2012CLOUDERAENTERPRISE4THESTANDARDFORHADOOPINTHEENTERPRISE

2014CLOUDERAFOCUSSESONSECURITY,ANDGOVERNANCEWITHNAVIGATOR2ANDCLOUDWITHDIRECTOR

2009CDH:FIRSTCOMMERICAL

APACHEHADOOPDISTRIBUTION&

CLOUDERAMANAGER

2011CLOUDERAUNIVERSITY

EXPANDSTO140COUNTRIESSUPPORTIMPLEMENTS

FOLLOWTHESUNMODEL

2012CLOUDERACONNECT

REACHES300PARTNERSACROSSSI,HARDEWARE,

ANDSOFTWAREPARTNERS

2014CLOUDERAINTRODUCES

THEENTERPRISEDATAHUBANDCLOUDERAENTERPRISE5

2015CLOUDERAINCLUDES

KAFKA,KUDUANDRECORDSERVICEWITHINCLOUDERAENTERPRISE

CDH / CM ENTERPRISEDATAHUB

CLOUDERAENTERPRISE

4

2016NAVIGATOROPTIMIZERGENERALAVAILABILITY,IMROVEDCLOUDCOVERAGEWITHAWS,AZUREANDGCP

∀Clouds

2017…CLOUDERAACQUIREDFAST

FORWARDLABS,ANNOUNCEDPaaSALTUS,DATASCIENCEWORKBENCH,

SHAREDDATAEXPERIENCE(SDX)ANDMORETOCOME!

Altus

CDSW

Page 5: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

5©Cloudera,Inc.Allrightsreserved.

WhatHappenNext:ADecadeofHadoop

18Projects

andbeyond

Page 6: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

6©Cloudera,Inc.Allrightsreserved.

GartnerAnalyticsAscendancyModel

Information

Optimization

DescriptiveAnalytics

Whathappened?

DiagnosticAnalytics

Whydidithappened?

PredictiveAnalytics

Whatwillhappened?

Hindsig

ht

Insight

Fo

resight

Value

Difficulty

PrescriptiveAnalytics

Whatwillmakeithappened?

Page 7: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

7©Cloudera,Inc.Allrightsreserved.

Cloudera&CiscoEnterpriseDataLakeInnovation

Page 8: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

8©Cloudera,Inc.Allrightsreserved.

BigDataStoreUCSC240/C3160

CiscoUCSIntegratedInfrastructurewithClouderaforIoT

Fog KafkaCiscoUCSC240

DataInject(CoAP/MQTT.XMPP) DataProcessing

DATAAggregator

CiscoUCSC240

C800/UCSMini/UCSC240

Real-TimeDataStoreUCSC220/C240

Batch

Real-Time

SpeedLayer BatchLayerISR8x9with4GLTEandDual802.11n

a/g/n(WiFi)RadiosManagedbyCiscoFogDirector

ServingLayer

DataAnalytics

CiscoUCSatalllayers,fullyvalidatedarchitectureswithallmajorplayers

Page 9: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

9©Cloudera,Inc.Allrightsreserved.

FabricCentricDesign

HighPerformance40GB/sEthernet;320GB/sperChassis

UnifiedFabricSingleCableforNetwork,Storage,andManagementTraffic

EasytoScaleSinglePointofManagement:AddCablesforBandwidthvs.FabricType

UCSManager

Management

Ethernet

Storage

Page 10: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

10©Cloudera,Inc.Allrightsreserved.

ManagementSimplicity

CentralizedManagement

ServiceProfilesforServers

•Manageallserverscentrally

ApplicationProfilesforNetwork

•Manageallnetworkcentrally

BigData:ManagementConsistency

HundredsofServers

Thousandsofmanagementpoints

SimplifiedScalability

EasilyScaleyourinfrastructurefromfewserverstothousandsofserverswithafullyIntegratedInfrastructure UCSServiceProfile CiscoACIApplicationProfile

Page 11: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

11©Cloudera,Inc.Allrightsreserved.

TheenterpriseplatformformachinelearningPATTERNRECOGNITION

DETECTIONPREDICTION500+

CUSTOMERSRUNON

DRIVECUSTOMERINSIGHTSMarketsegmentationCustomer360NextbestofferChurnanalysis&prevention

PROTECTBUSINESS

CybersecurityFraudAnti-moneylaunderingRiskmodeling&assessmentSPAMdetection

CONNECTPRODUCTS&SERVICES(IoT)

PredictivemaintenanceGenomics&personalizedmedicinePredictingandpreventingdisease

Naturallanguage

Page 12: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

12©Cloudera,Inc.Allrightsreserved.

Machinelearningrequiresacomplete stack.

Prepare AnalyzeData Deploy

• Loadexternaldata• Processstructureddata• Processunstructureddata• Processstreamingdata• Cleansedata• Vectorize data

• Diagnose/treatdataissues• Designexperiments• Partitiondata• Engineerfeatures• Trainandvalidatemodels• Evaluateandassessmodels

• PublishtoBI/Viz• Deploytobatchscoring• Deploytoreal-timescoring• DeploytoscoringAPI• Managemodels• Monitormodelperformance

Administration,GovernanceandSecurity

BusinessIntegration

● BatchProcessing● StreamProcessing● InteractiveSQL● SearchTools● Text/ImageProcessing

● AnalyticLanguages● MLLibraries

● BI/Viz● InteractiveSQL● BatchProcessing● StreamProcessing● OperationalDB

Page 13: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

13©Cloudera,Inc.Allrightsreserved.

Acomplete,integratedenterpriseplatform

ClouderaEnterpriseDataHub

ClouderaDistributionforHadoop

Page 14: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

14©Cloudera,Inc.Allrightsreserved.

• Supportsdatascienceend-to-end• Fullaccesstodata• Secureself-serviceprovisioning• Containerizedenvironments• SupportsPython,R,andScala• Automates:

WorkflowVersioncontrolCollaborationSharing

Cloudera Data Science Workbench

Page 15: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

15©Cloudera,Inc.Allrightsreserved.

DataScientists•Webbrowser,nodesktopfootprint• UseR,Python,orScala• Installanylibraryorframework• Isolatedprojectenvironments• Directaccesstodatainsecureclusters• Shareinsightswithteam• Reproducible,collaborativeresearch• Automateandmonitordatapipelines• Built-injobscheduling

IT

• Supportself-servicedatascience• Fullplatformsecurity• Kerberosauthentication• Runon-premisesorinthecloud

CDSWBenefits

Page 16: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

16©Cloudera,Inc.Allrightsreserved.

DeeplearninginClouderawithApacheSpark

• Twopackages:• CaffeOnSpark• TensorFlowOnSpark

• DevelopedbyYahoo• PythonandScalaAPIs• AllDLarchitectures• Integratedpipeline• Runonexistingclusters• Trainingandinference

• OpensourceDLlibrary• DevelopedbySkymind• BuiltonJVMs• SupportsCPUsandGPUs• Java,Scala,PythonAPIs• Trainingandinference• Importsmodelsfrom:

• TensorFlow• Caffe• Torch• Theano

• Runsonexistingclusters

• Deeplearningframework• DevelopedbyIntel• SupportsCPUsonly• LeveragesIntelMKL• Scala,PythonAPIs• Importsmodelsfrom:

• TensorFlow• Caffe• Torch

• Runsonexistingclusters

SparkPackages DL4J BigDL

Page 17: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

17©Cloudera,Inc.Allrightsreserved.

“OurdatascientistswantGPUs,butwecan’tfindawaytodelivermulti-tenancy.Iftheygotothecloudontheirown,it’sexpensiveandwelosegovernance.”

● ExtendexistingCDSWbenefitstoGPU-optimizeddeeplearningtools

● Schedule&shareGPUresources

● TrainonGPUs,deployonCPUs

● Workson-premises orcloud

New!Accelerateddeeplearningon-demandwithGPUs

DataScienceWorkbench

GPUCPU

CDH

CPU

CDH

CPU

single-nodetraining distributedtraining,scoring

Multi-tenantGPUsupporton-premisesorcloud

Page 18: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

18©Cloudera,Inc.Allrightsreserved.

EnterpriseDataLakeArchitecture

Page 19: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

19©Cloudera,Inc.Allrightsreserved.

CanonicalIngestion&SparkStreamingAnalyticswithCiscoBigDataAnalyticsPlatform

• IntegratewithApacheSparkStreamingforreal-timeanalysisofdata• WritebacktoKafkaforfurtherprocessingortosendtoanapplicationlayer

Page 20: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

20©Cloudera,Inc.Allrightsreserved.

ProposedArchitectureforEnterpriseDataPlatform

DataWarehouse

DataWarehouse

DataSourcesMeteorologicalData

SensorsData

Unstructuredfiles

Social

Video/Image

Historian/SCADA

Geospatial

BWHANABPC

anySAPNW

DataVisualizationAdvancedAnalyticsAIPlatform

DataWarehouseEnterpriseDataWarehouse

Page 21: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

21©Cloudera,Inc.Allrightsreserved.

BigDataBlueprints:CiscoValidatedDesignsDesignsBigData

Cisco® ValidatedDesignswithCloudera

Solutiondesigned,tested,anddocumentedtofacilitatefaster,morereliable,andmorepredictablecustomerdeployments.

Whatyouget

Industry-leadingpartnerships

Testedandvalidatedreferencearchitecturestomeetperformance,capacity,andscale

Jointengineeringlab

Extensiveoptionsfordatamanagement(Hadoop,MPP,andNoSQL)tomeetyourbusinessneedsSolutionbundlesoptimizedforcostofownershipandeaseofordering

Page 22: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

22©Cloudera,Inc.Allrightsreserved.

OurCustomers’SuccessStories

Page 23: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

23©Cloudera,Inc.Allrightsreserved.

UsingPredictiveMaintenancetoImprovePerformanceandReduceFleetDowntime• OnCommandConnectioniscollectingtelematics andgeolocation dataacrossthefleet

• Reducedmaintenancecoststo$.03permilefrom$.12-$.15permile

• Centralizingdatafrom13systemswithvaryingfrequencyandsemanticdefinitions

• Real-time visibilityof250,000+ trucksinordertoimproveuptimeandvehicleperformance

TRANSPORTATION»PREDICTIVEMAINTENANCE»IMPROVEDSERVICE»DATADRIVENPRODUCTS

DATA-DRIVENPRODUCTS

CASESTUDY

Page 24: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

24©Cloudera,Inc.Allrightsreserved.

UsingSensors&IoTtoImprovePassengerSafetyandAirportEfficiency

Challenge:• Improvetravelersatisfactionandsafety,byreducingdowntimeforcriticaloperationalmachinery

Solution:• ClouderaonAzuretocapture,secure,andcorrelatesensor(IoT)datacollectedfromescalators,elevators,andbaggagecarousels

• Providenecessaryfixestopreventunplanneddowntime

SmartBuildings- PreventativeMaintenance

DATA-DRIVENPROCESS

CASESTUDY

DATA-DRIVENPRODUCTS

TRAVEL &TRANSPORTATION»SMARTBUILDINGS»PREDICTIVEMAINTENANCE»ADVANCEDANALYTICS

Page 25: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

25©Cloudera,Inc.Allrightsreserved.

EnablingtheStateofKentuckymanagesnowandiceeventsinrealtime

Challenge:• Neededmoreefficientapproachtoinclementweatherroadmanagement

Solution:• Real-timeweatherresponsesystemthatincorporatesreal-timedatafromWaze,HERE,ESRI’sGeoEvent processor,andAutomaticVehicleLocations(providingsensordatafromsalttrucks).

• KYTCaggregates15-20millionrecordseverydayandprocessmorethanamillionrecordspersecond.

SmartCities

2016DataImpactAwardWinnerStateofKentuckyDepartmentofTransportation

CASESTUDY

Page 26: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

26©Cloudera,Inc.Allrightsreserved.

DatainapplicationsilosLimitedInsightsSummarized

BasicSecurityControlsAuthorizationAuthentication

ComprehensiveAuditing

DataSecurity&GovernanceLineageVisibility

MetadataDiscoveryEncryption&KeyManagement

FullyComplianceReadyAudit-Ready&Protected

AuditReadyFor:PCIPII

Fullencryption,keymanagement,transparency,andenforcementforalldata-at-restanddata-in-motion

SecurityCompliance&RiskMitigation

0– DatainIsolation

1– BehaviorandTransactionFusion

2– ExpandedDataSurfaceArea

3– EDH:SecureDataVault

DataProtection&Governance

Page 27: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

27©Cloudera,Inc.Allrightsreserved.

WhyCloudera

ThePlatformforNext-GenerationAnalyticsClouderaEnterprisedeliversthecapabilitiesrequiredbythelargestenterprises,spanninganalytics,security,governance,andmanagement.WemakeHadoopfast,easy,andsecure.

TheExperience toHelpYouSucceedNooneknowsHadooplikeCloudera.AsthefirstHadoopcompany,Clouderais theworld’sleading contributorto andproviderofenterpriseHadoop,withexperienceyoucanrelyontohelpyousucceed.

OpenInnovationOuruniquehybridopensourcestrategyenablesustoleadtheenterpriseexpansionoftheHadoopecosystem,drivinginnovativenewcapabilitiesandopenstandardsinthecommunity.

Page 28: Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

28©Cloudera,Inc.Allrightsreserved.

[email protected] |[email protected] |+6593682316