Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise...

Preview:

Citation preview

1©Cloudera,Inc.Allrightsreserved.

BuildingtheEnterpriseDataLakewithCloudera&CiscoPreparedby:MarilynTan,CountryManagerSingaporeXueDaming,SeniorSystemsEngineer

2©Cloudera,Inc.Allrightsreserved.

DigitalTransformationwithData

3©Cloudera,Inc.Allrightsreserved.

DATAisTransformingtheWorld!

DIGITALDEMOCRACY&SECURITY

- DigitalSharingEconomy:OpenData&Algorithms

- EnterprisereadyOpenSource(e.g.Apache)

- Digital(distributed)Trust(esp.Blockchain)

RISEOFOPENSOURCE

SMARTAPPs

- ConnectedExperience

- Ubiquitouscomputing:Everywhere&oneverydevice(Voice,VR,AR,mobile,Wearables)

THENEWUX

CONNECTEDWORLD

INTERNETOFTHINGSINDUSTRY4.0

- SmartThings/Devices

- NewUseCases

- NewArchitectures

- By2020:20.8bdevices

- IoT:$1.7trillionin2020

MOREDATA

- NewAnalytics

- MachineLearning&AI

- Newdatasources

- DataVirtualization

- DataScience

DATAAS4th PRODUCTIONFACTOR

4©Cloudera,Inc.Allrightsreserved.

The9yearClouderajourney…

2008CLOUDERAFOUNDEDBYMIKEOLSON,AMRAWADALLAH&JEFFHAMMERBACHER,CHRISTOPHEBISCIGLIAJOINEDBYDOUGCUTTING(2009)

2013CLOUDERAEXPANDSBEYONDMRANDHBASE,INTRODUCINGIMPALA,SOLRANDSPARK

2011CLOUDERAREACHES100PRODUCTIONCUSTOMERS

2012CLOUDERAENTERPRISE4THESTANDARDFORHADOOPINTHEENTERPRISE

2014CLOUDERAFOCUSSESONSECURITY,ANDGOVERNANCEWITHNAVIGATOR2ANDCLOUDWITHDIRECTOR

2009CDH:FIRSTCOMMERICAL

APACHEHADOOPDISTRIBUTION&

CLOUDERAMANAGER

2011CLOUDERAUNIVERSITY

EXPANDSTO140COUNTRIESSUPPORTIMPLEMENTS

FOLLOWTHESUNMODEL

2012CLOUDERACONNECT

REACHES300PARTNERSACROSSSI,HARDEWARE,

ANDSOFTWAREPARTNERS

2014CLOUDERAINTRODUCES

THEENTERPRISEDATAHUBANDCLOUDERAENTERPRISE5

2015CLOUDERAINCLUDES

KAFKA,KUDUANDRECORDSERVICEWITHINCLOUDERAENTERPRISE

CDH / CM ENTERPRISEDATAHUB

CLOUDERAENTERPRISE

4

2016NAVIGATOROPTIMIZERGENERALAVAILABILITY,IMROVEDCLOUDCOVERAGEWITHAWS,AZUREANDGCP

∀Clouds

2017…CLOUDERAACQUIREDFAST

FORWARDLABS,ANNOUNCEDPaaSALTUS,DATASCIENCEWORKBENCH,

SHAREDDATAEXPERIENCE(SDX)ANDMORETOCOME!

Altus

CDSW

5©Cloudera,Inc.Allrightsreserved.

WhatHappenNext:ADecadeofHadoop

18Projects

andbeyond

6©Cloudera,Inc.Allrightsreserved.

GartnerAnalyticsAscendancyModel

Information

Optimization

DescriptiveAnalytics

Whathappened?

DiagnosticAnalytics

Whydidithappened?

PredictiveAnalytics

Whatwillhappened?

Hindsig

ht

Insight

Fo

resight

Value

Difficulty

PrescriptiveAnalytics

Whatwillmakeithappened?

7©Cloudera,Inc.Allrightsreserved.

Cloudera&CiscoEnterpriseDataLakeInnovation

8©Cloudera,Inc.Allrightsreserved.

BigDataStoreUCSC240/C3160

CiscoUCSIntegratedInfrastructurewithClouderaforIoT

Fog KafkaCiscoUCSC240

DataInject(CoAP/MQTT.XMPP) DataProcessing

DATAAggregator

CiscoUCSC240

C800/UCSMini/UCSC240

Real-TimeDataStoreUCSC220/C240

Batch

Real-Time

SpeedLayer BatchLayerISR8x9with4GLTEandDual802.11n

a/g/n(WiFi)RadiosManagedbyCiscoFogDirector

ServingLayer

DataAnalytics

CiscoUCSatalllayers,fullyvalidatedarchitectureswithallmajorplayers

9©Cloudera,Inc.Allrightsreserved.

FabricCentricDesign

HighPerformance40GB/sEthernet;320GB/sperChassis

UnifiedFabricSingleCableforNetwork,Storage,andManagementTraffic

EasytoScaleSinglePointofManagement:AddCablesforBandwidthvs.FabricType

UCSManager

Management

Ethernet

Storage

10©Cloudera,Inc.Allrightsreserved.

ManagementSimplicity

CentralizedManagement

ServiceProfilesforServers

•Manageallserverscentrally

ApplicationProfilesforNetwork

•Manageallnetworkcentrally

BigData:ManagementConsistency

HundredsofServers

Thousandsofmanagementpoints

SimplifiedScalability

EasilyScaleyourinfrastructurefromfewserverstothousandsofserverswithafullyIntegratedInfrastructure UCSServiceProfile CiscoACIApplicationProfile

11©Cloudera,Inc.Allrightsreserved.

TheenterpriseplatformformachinelearningPATTERNRECOGNITION

DETECTIONPREDICTION500+

CUSTOMERSRUNON

DRIVECUSTOMERINSIGHTSMarketsegmentationCustomer360NextbestofferChurnanalysis&prevention

PROTECTBUSINESS

CybersecurityFraudAnti-moneylaunderingRiskmodeling&assessmentSPAMdetection

CONNECTPRODUCTS&SERVICES(IoT)

PredictivemaintenanceGenomics&personalizedmedicinePredictingandpreventingdisease

Naturallanguage

12©Cloudera,Inc.Allrightsreserved.

Machinelearningrequiresacomplete stack.

Prepare AnalyzeData Deploy

• Loadexternaldata• Processstructureddata• Processunstructureddata• Processstreamingdata• Cleansedata• Vectorize data

• Diagnose/treatdataissues• Designexperiments• Partitiondata• Engineerfeatures• Trainandvalidatemodels• Evaluateandassessmodels

• PublishtoBI/Viz• Deploytobatchscoring• Deploytoreal-timescoring• DeploytoscoringAPI• Managemodels• Monitormodelperformance

Administration,GovernanceandSecurity

BusinessIntegration

● BatchProcessing● StreamProcessing● InteractiveSQL● SearchTools● Text/ImageProcessing

● AnalyticLanguages● MLLibraries

● BI/Viz● InteractiveSQL● BatchProcessing● StreamProcessing● OperationalDB

13©Cloudera,Inc.Allrightsreserved.

Acomplete,integratedenterpriseplatform

ClouderaEnterpriseDataHub

ClouderaDistributionforHadoop

14©Cloudera,Inc.Allrightsreserved.

• Supportsdatascienceend-to-end• Fullaccesstodata• Secureself-serviceprovisioning• Containerizedenvironments• SupportsPython,R,andScala• Automates:

WorkflowVersioncontrolCollaborationSharing

Cloudera Data Science Workbench

15©Cloudera,Inc.Allrightsreserved.

DataScientists•Webbrowser,nodesktopfootprint• UseR,Python,orScala• Installanylibraryorframework• Isolatedprojectenvironments• Directaccesstodatainsecureclusters• Shareinsightswithteam• Reproducible,collaborativeresearch• Automateandmonitordatapipelines• Built-injobscheduling

IT

• Supportself-servicedatascience• Fullplatformsecurity• Kerberosauthentication• Runon-premisesorinthecloud

CDSWBenefits

16©Cloudera,Inc.Allrightsreserved.

DeeplearninginClouderawithApacheSpark

• Twopackages:• CaffeOnSpark• TensorFlowOnSpark

• DevelopedbyYahoo• PythonandScalaAPIs• AllDLarchitectures• Integratedpipeline• Runonexistingclusters• Trainingandinference

• OpensourceDLlibrary• DevelopedbySkymind• BuiltonJVMs• SupportsCPUsandGPUs• Java,Scala,PythonAPIs• Trainingandinference• Importsmodelsfrom:

• TensorFlow• Caffe• Torch• Theano

• Runsonexistingclusters

• Deeplearningframework• DevelopedbyIntel• SupportsCPUsonly• LeveragesIntelMKL• Scala,PythonAPIs• Importsmodelsfrom:

• TensorFlow• Caffe• Torch

• Runsonexistingclusters

SparkPackages DL4J BigDL

17©Cloudera,Inc.Allrightsreserved.

“OurdatascientistswantGPUs,butwecan’tfindawaytodelivermulti-tenancy.Iftheygotothecloudontheirown,it’sexpensiveandwelosegovernance.”

● ExtendexistingCDSWbenefitstoGPU-optimizeddeeplearningtools

● Schedule&shareGPUresources

● TrainonGPUs,deployonCPUs

● Workson-premises orcloud

New!Accelerateddeeplearningon-demandwithGPUs

DataScienceWorkbench

GPUCPU

CDH

CPU

CDH

CPU

single-nodetraining distributedtraining,scoring

Multi-tenantGPUsupporton-premisesorcloud

18©Cloudera,Inc.Allrightsreserved.

EnterpriseDataLakeArchitecture

19©Cloudera,Inc.Allrightsreserved.

CanonicalIngestion&SparkStreamingAnalyticswithCiscoBigDataAnalyticsPlatform

• IntegratewithApacheSparkStreamingforreal-timeanalysisofdata• WritebacktoKafkaforfurtherprocessingortosendtoanapplicationlayer

20©Cloudera,Inc.Allrightsreserved.

ProposedArchitectureforEnterpriseDataPlatform

DataWarehouse

DataWarehouse

DataSourcesMeteorologicalData

SensorsData

Unstructuredfiles

Social

Video/Image

Historian/SCADA

Geospatial

BWHANABPC

anySAPNW

DataVisualizationAdvancedAnalyticsAIPlatform

DataWarehouseEnterpriseDataWarehouse

21©Cloudera,Inc.Allrightsreserved.

BigDataBlueprints:CiscoValidatedDesignsDesignsBigData

Cisco® ValidatedDesignswithCloudera

Solutiondesigned,tested,anddocumentedtofacilitatefaster,morereliable,andmorepredictablecustomerdeployments.

Whatyouget

Industry-leadingpartnerships

Testedandvalidatedreferencearchitecturestomeetperformance,capacity,andscale

Jointengineeringlab

Extensiveoptionsfordatamanagement(Hadoop,MPP,andNoSQL)tomeetyourbusinessneedsSolutionbundlesoptimizedforcostofownershipandeaseofordering

22©Cloudera,Inc.Allrightsreserved.

OurCustomers’SuccessStories

23©Cloudera,Inc.Allrightsreserved.

UsingPredictiveMaintenancetoImprovePerformanceandReduceFleetDowntime• OnCommandConnectioniscollectingtelematics andgeolocation dataacrossthefleet

• Reducedmaintenancecoststo$.03permilefrom$.12-$.15permile

• Centralizingdatafrom13systemswithvaryingfrequencyandsemanticdefinitions

• Real-time visibilityof250,000+ trucksinordertoimproveuptimeandvehicleperformance

TRANSPORTATION»PREDICTIVEMAINTENANCE»IMPROVEDSERVICE»DATADRIVENPRODUCTS

DATA-DRIVENPRODUCTS

CASESTUDY

24©Cloudera,Inc.Allrightsreserved.

UsingSensors&IoTtoImprovePassengerSafetyandAirportEfficiency

Challenge:• Improvetravelersatisfactionandsafety,byreducingdowntimeforcriticaloperationalmachinery

Solution:• ClouderaonAzuretocapture,secure,andcorrelatesensor(IoT)datacollectedfromescalators,elevators,andbaggagecarousels

• Providenecessaryfixestopreventunplanneddowntime

SmartBuildings- PreventativeMaintenance

DATA-DRIVENPROCESS

CASESTUDY

DATA-DRIVENPRODUCTS

TRAVEL &TRANSPORTATION»SMARTBUILDINGS»PREDICTIVEMAINTENANCE»ADVANCEDANALYTICS

25©Cloudera,Inc.Allrightsreserved.

EnablingtheStateofKentuckymanagesnowandiceeventsinrealtime

Challenge:• Neededmoreefficientapproachtoinclementweatherroadmanagement

Solution:• Real-timeweatherresponsesystemthatincorporatesreal-timedatafromWaze,HERE,ESRI’sGeoEvent processor,andAutomaticVehicleLocations(providingsensordatafromsalttrucks).

• KYTCaggregates15-20millionrecordseverydayandprocessmorethanamillionrecordspersecond.

SmartCities

2016DataImpactAwardWinnerStateofKentuckyDepartmentofTransportation

CASESTUDY

26©Cloudera,Inc.Allrightsreserved.

DatainapplicationsilosLimitedInsightsSummarized

BasicSecurityControlsAuthorizationAuthentication

ComprehensiveAuditing

DataSecurity&GovernanceLineageVisibility

MetadataDiscoveryEncryption&KeyManagement

FullyComplianceReadyAudit-Ready&Protected

AuditReadyFor:PCIPII

Fullencryption,keymanagement,transparency,andenforcementforalldata-at-restanddata-in-motion

SecurityCompliance&RiskMitigation

0– DatainIsolation

1– BehaviorandTransactionFusion

2– ExpandedDataSurfaceArea

3– EDH:SecureDataVault

DataProtection&Governance

27©Cloudera,Inc.Allrightsreserved.

WhyCloudera

ThePlatformforNext-GenerationAnalyticsClouderaEnterprisedeliversthecapabilitiesrequiredbythelargestenterprises,spanninganalytics,security,governance,andmanagement.WemakeHadoopfast,easy,andsecure.

TheExperience toHelpYouSucceedNooneknowsHadooplikeCloudera.AsthefirstHadoopcompany,Clouderais theworld’sleading contributorto andproviderofenterpriseHadoop,withexperienceyoucanrelyontohelpyousucceed.

OpenInnovationOuruniquehybridopensourcestrategyenablesustoleadtheenterpriseexpansionoftheHadoopecosystem,drivinginnovativenewcapabilitiesandopenstandardsinthecommunity.

28©Cloudera,Inc.Allrightsreserved.

Thankyoumarilyn@cloudera.com |+6598222338daming@cloudera.com |+6593682316

Recommended