34
Empower Data-Driven Organizations with HPE and Hadoop Gilles Noisette – HPE EMEA Big Data CoE 04/13/2016

Empower Data-Driven Organizations

Embed Size (px)

Citation preview

Page 1: Empower Data-Driven Organizations

Empower Data-Driven Organizations with HPE and HadoopGilles Noisette – HPE EMEA Big Data CoE

04/13/2016

Page 2: Empower Data-Driven Organizations

Agenda

• A Data-driven world• HPE Contribution to Spark•HPE Innovations for Hadoop•Enterprise Grade SQL Analytics for Hadoop•Data-centric Security for Hadoop• HPE Data Discovery service

to help you pull together these innovations

Page 3: Empower Data-Driven Organizations

Transformto a hybrid

infrastructure

Enableworkplace

productivity

Protectyour digitalenterprise

Empowerthe data-drivenorganization

Page 4: Empower Data-Driven Organizations

Transformto a hybrid

infrastructure

Enableworkplace

productivity

Protectyour digitalenterprise

Empower the data-driven organizationHarness 100% of your relevant data to empower people with actionable insights that drive superior business outcomes.

Page 5: Empower Data-Driven Organizations

Enterprise Spark at scaleHP Labs is helping make Apache Spark better

Page 7: Empower Data-Driven Organizations

8

HPE Contribution to Apache SparkMartin Fink announcement

Hortonworks and HP Labs join forces to boost SparkHewlett Packard Labs is working with Hortonworks to enhance the efficiency and scale of memory for the enterprise and to dramatically improve memory utilization

– Enhanced shuffle engine technologies. Faster sorting and in-memory computations, which has the potential to dramatically improve Spark performance

– Better memory utilization. Improved performance and usage for broader scalability, which will help enable new large-scale use cases

“We're hoping to enable the Spark community to derive insight more rapidly, from much larger data sets, without having to change a single line of code”Martin Fink, CTO & Director HPLabs

Tested with customers from the Financial services industryProvides from 3x to 15x performance increases

Page 8: Empower Data-Driven Organizations

HPE Innovations for HadoopOptimized Infrastructure and Architecture

10

Page 9: Empower Data-Driven Organizations

HPE Servers and Architectures for Hadoop

Traditional

• Tried-and-True Platform• Corp standard: “I buy DL380’s”• Small to large deployments

(very often ~20 nodes)• Linear growth of balanced

workloads

Optimized

• Purpose-Built for Big Data• Mid-size to large deployments• Single, resource-intensive

workload• Workload optimized• Multi-temperate storage• “Optimized traditional”• Higher density, lower TCO

Converged

• MPP DBMS approach + open source• Mid-size to large deployments• Non-linear storage and

compute/memory growth• Multiple workloads, latency demands• Isolate workload hot spots • Scale compute and storage

separately, elastically• Innovative, TCO-driven approach

ProLiantDL380Gen9

UID

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

ProLiantDL 380Gen9

UID

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

UID UID

21

UID

Apollo4500Gen9

UID

Tray 222191613

24211815

10741

12963

Tray 1

Pull for tray 2Pull for tray 2

Apollo4200 Gen9

UID UID UID

21

UID

21

UID

21

UID

Apollo4500Gen9

10

9

8

7

6

14

13

12

11

19

18

17

16

15

24

23

22

21

20

5

4

3

2

1

UID

Ap ollo2000 System

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

ProLiantDL380Gen9

UID

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

ProLiantDL380Gen9

UID

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

SATA7.2K

3.0 TB

Symmetric Architectures Asymmetric Architecture

Conventional Wisdom Forward-thinking

UID

28

30

29

31

33

21

34

36

35

37

39

38

40

42

41

43

45

44

1

3

2

4

6

5

7

9

8

10

12

11

13

15

14

16

18

17

19

21

20

22

24

23

25

27

26BA

Moonshot1500

DL380 Gen9Apollo 4xxx

Moonshot & Apollo

Page 10: Empower Data-Driven Organizations

HPE Reference Architecture(s) for Hadoop

• Scaling from 4 to thousands of HPE Servers• Sized to customer’s workload and storage needs• Impressive Processor and Storage densityA set of pre-tested hardware components• Processor, Drives, Network, 1TB/8TB disk size etc ...

Breakthrough economics, density, simplicity

Flexible, pre-approved & optimized configurationsHPE Apollo 4000

example

24 x HPE ProLiant

Apollo 4530Worker Nodes

HPE 5900 10GbEHPE 5930 10GbE x 2 Network Switches

3 x DL360 Gen9Head Nodes

Apollo 4510

3.5 PB raw storage900 TB Hadoop usable

960 Xeon E5 coresfor a full rack

Apollo 4530

UID

ProLiantDL 380e

Gen8

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2 .0 TB

SATA7.2K

2 .0 TB

SATA7.2K

2 .0 TB

DL 380

2.46 PB raw storage630 TB Hadoop usable

756 Xeon E5 coresfor a full rack

UID

ProLiantDL380eGen8

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

SATA7.2K

2.0 TB

Apollo 4200

4.6 PB raw storage1 PB Hadoop usable756 Xeon E5 cores

for a full rack

UID

10 134 71

11 145 82

12 156 93

UID

10 134 71

11 145 82

12 156 93

UID

10 134 71

11 145 82

12 156 93

UID UID UID

ProLiantSL4540Gen8

SATA7.2K

500 GB

SATA7.2K

500 GB

SATA7.2K

500 GB

SATA7.2K

500 GB

SATA7.2K

500 GB

SATA7.2K

500 GB

5.3 PB raw storage1.3 PB Hadoop usable

320 Xeon E3 coresfor a full rack

Page 11: Empower Data-Driven Organizations

HPE Apollo 4200 - Bringing Big Data storage server density to enterpriseUsed as standard Hadoop Worker node and BDRA Asymmetric Storage node

Storage density28 LFF Data drives

DataCenter Plug and play

Performance and efficiency

Divide by 2 the number of serverDivide by 2 the number of Network portsDivide by 2 the needed square metersLower the number of needed licenses/subscriptions

Highest storage density in a traditional 2U rack server - 224 TB up to 4.6PB / rackPerfect core/spindle ratio of 1 with 28 cores (2 x 14) and 28 drive spindles

Enterprise bridgeFits traditional enterprise/SME rack server data centersLower the electric power needs

Configuration flexibilityBalanced capacity, performance and throughput with flexible options - Disks, CPUs , I/O and interconnects

Page 12: Empower Data-Driven Organizations

14

Hadoop on HPE MoonshotWhat would be a good server cartridge for Hadoop ?

Processing– Number of Xeon cores : 8– very efficient I/Os

Memory– Memory : 128GB

Storage– Data storage : 2TB m.2 (SSD)

Network– Fast network (2 x 10GbE)– Low latency chassis interconnect

ImpalaSQL on Hadoop

45 x 128GB = 5.6TB RAM - 45 x 2TB = 90TB fast Data storage in 4U

45 servers per enclosure

Page 13: Empower Data-Driven Organizations

HPE Asymmetric Architecture for HadoopHPE Vertica SQL on HadoopEnterprise-Grade Hadoop

15

Page 14: Empower Data-Driven Organizations

17

HPE Big Data Reference ArchitectureHPE Brings Enterprise Data Center Architecture to Hadoop

Traditional Hadoop Cluster Architecture– Compute and storage are always co-located

– All servers are identical

– Data is partitioned across servers on direct attached storage

HPE Big Data Reference Architecture– Separate, optimized compute and storage tiers

connected by high speed networking

– Standard Hadoop installed with storage components on the storage servers and applications on the compute servers

– Enabled and optimized by purpose-selected HPE Moonshot and Apollo servers and HPE/Hortonworks workload management software (contributed to the community)

Servers

Applications, data files

Compute Servers

Storage Servers

Applications, intermediate data

Data files

Symmetric architecture

Asymmetric architecture

Page 15: Empower Data-Driven Organizations

18

10

9

8

7

6

14

13

12

11

19

18

17

16

15

24

23

22

21

20

5

4

3

2

1

UID

Apollo2000 Syst em

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

10

9

8

7

6

14

13

12

11

19

18

17

16

15

24

23

22

21

20

5

4

3

2

1

UID

Apollo2000 Syst em

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

10

9

8

7

6

14

13

12

11

19

18

17

16

15

24

23

22

21

20

5

4

3

2

1

UID

Apollo2000 Syst em

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

10

9

8

7

6

14

13

12

11

19

18

17

16

15

24

23

22

21

20

5

4

3

2

1

UID

Apollo2000 Syst em

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

SAS900 GB

10K

Benefits of HPE Big Data Reference Architecture for HadoopDelivering value to the business

High Speed Network

Data ConsolidationHosting Multiple Workloads

Maximum Elasticity and Workload Isolation

Balance and Scale Compute and Storage Independently

Breakthrough Density and TCO

HPE Moonshot or HPE Apollo

HPE Apollo 4xx0

Page 16: Empower Data-Driven Organizations

Advantages* of HPE Big Data Reference ArchitectureRoom to Grow - The same performance in half the space

19

* Normalized on performance, based on Terasort testingHPE Big Data

Reference ArchitectureTraditional Architecture

Traditional Big Data Architecture

HPE Big Data Reference Architecture

Hadoop performance Equivalent

Density >2x more dense

Network bandwidth 40Gbit versus 10Gbit

HDFS Storage performance

2x greater

Power (watts) Half the power

Page 17: Empower Data-Driven Organizations

Independent scaling of compute and storageGrow to match your workload and data sources

20

Hot (Compute) Configuration Cold (Storage) Configuration

HPE Big Data Reference ArchitectureTraditional Architecture

2.8x compute97% of the storage capacity4x the memory

1.6x compute1.5x the storage capacity2.5x the memory

90% of the compute2.1x the storage capacity1.5x the memory

Page 18: Empower Data-Driven Organizations

HPE Big Data Reference ArchitectureHadoop and its ecosystem take advantage of the BDRA

Network SwitchesEast - West Networking

Impala

SSD based Hard Disk based Archive

High Speed Network

Page 19: Empower Data-Driven Organizations

Enterprise Grade SQL Analytics for Hadoop

• Develop your own analytical applications with full-functionality ANSI SQL

• Vertica Inside - Powerful and Proven SQL Query Engine

• Installs in Hadoop cluster, supporting Ambari, YARN-ready

• Enterprise-Ready, Stable with full ANSI SQL capabilities, Predictive analytics

HPE Vertica SQL on Hadoop

YARN Apps

HDFS, ORC, Parquet

Compute optimized Servers

Storage optimized Servers

SQL on Hadoop

Page 20: Empower Data-Driven Organizations

First commercially available columnar database

Native Advanced Analytics to deliver insight at the speed of business

Native Hadoop Integration

SaaS and AMI Cloud options

Support for new open source architectures includingKafka and Spark.

Core Vertica SQL EngineAdvanced Analytics

Open ANSI SQL Standards ++ R, Python, Java, ScalaCore is Key

Same core Vertica engine delivers advanced analytics wherever your enterprise needs demand — today and tomorrow.

HP Vertica forSQL on HadoopNative support for ORC, ParquetSupports all distributionsNo helper node or singlepoint of failure

HP VerticaEnterprise EditionColumnar storage and advanced compressionIndustry leading performance and scalabilityVertica Community Edition Free up to 1 TB

Build a data-centric foundationHPE Vertica Advanced Analytics Family– with enterprise-grade reliability and scalability

HP Vertica OnDemandGet up and running in < 1HRPay by the TB or Query

HP Vertica AMIHundreds of TB deployedBring your own license to Amazon Web Services

Page 21: Empower Data-Driven Organizations

HPE Big Data Architecture long term viewEvolve to support multiple compute and storage blocks

Low Cost Nodes

SSD Nodes Disk Nodes Archive Nodes

Multi-temperate Storage using HDFS Tiering and ObjectStores

GPU Nodes FPGA Nodes Big Memory Nodes

Workload Optimized compute nodes to accelerate various big data software

Page 22: Empower Data-Driven Organizations

Data-centric security for HadoopEnterprise-Grade Hadoop

25

Page 23: Empower Data-Driven Organizations

HPE SecureData provides the missing data protection

26

Traditional IT Infrastructure Security

Disk encryption

Database encryption

SSL/TLS/firewalls

AuthenticationManagement

Threats toData

Malware,Insiders

SQL injection,Malware

TrafficInterceptors

Malware,Insiders

CredentialCompromise

Security Gaps

HPE SecureData Data-centric Security

SSL/TLS/firewalls

Dat

a se

curit

y co

vera

ge

End-

to-e

nd P

rote

ctio

n Middleware/Network

Storage

Databases

File Systems

Data & Applications

DataEcosystem

Security gap

Security gap

Security gap

Security gap

Page 24: Empower Data-Driven Organizations

HPE SecureDataProtecting sensitive and regulated data in Hadoop

– Stateless Key Management– No key database to store or manage– High performance, unlimited scalability

– Both encryption and tokenization technologies– Customize solution to meet exact requirements

– Broad platform support – On-premise / Cloud / Big Data– Structured / Unstructured– Hadoop, HPE Vertica, Linux, Windows, AWS, HPE NonStop,

Teradata, IBM z/OS, etc.

– Quick time-to-value– Complete end-to-end protection within a common platform– Format-preservation dramatically reduces implementation effort

27

HPE SecureData Management Console

HPE SecureData Web Services API

HPE SecureDataNative APIs

(C, Java, C#./NET)HPE SecureData Command Lines

HPE SecureDataKey Servers

HPE SecureData File Processor

Page 25: Empower Data-Driven Organizations

28

Field level, format-preserving, reversible data de-identificationCustomizable to granular requirements addressed by encryption & tokenization

Credit card1234 5678 8765 4321

SSN/ID934-72-2356

Email [email protected]

DOB31-07-1966

Full 8736 5533 4678 9453 347-98-8309 [email protected] 20-05-1972

Partial 1234 5681 5310 4321 634-34-2356 [email protected] 20-05-1972

Obvious 1234 56AZ UYTZ 4321 AZS-UD-2356 [email protected] 20-05-1972

FPE**SST*

*Secure Stateless Tokenization (SST)**Format-Preserving Encryption (FPE)

Page 26: Empower Data-Driven Organizations

Data Discovery serviceDiscover the value of your Data

29

Page 27: Empower Data-Driven Organizations

Align business goals and challenges with the relevant data

How to discover the value of your data

Evaluate your data and quickly test, learn, and iterate ideas to discover value

Create a strategic roadmap based on learnings

Key HPE solutionsData Discovery

Data Driven Transformation Planning

Business benefitsAgile execution to impactful projects

Maximize alignment to value

Page 28: Empower Data-Driven Organizations

• To help you with your journey, HPE Data Discovery Solution provides an end-to-end approach to realizing the value of your data

• Includes experienced consultants, proven processes, modern big data analytics platforms and infrastructure, and convenient delivery options.

• Empowers you to realize:• Clear path to business insights and value• Rapid exploration and real-time access• Lower risk• Lower costs

Business value metrics• Improve business processes• Enable better operations performance• Understand customer better• Increase market share, margin, and/or revenue

Business Value HPE Data Discovery Solution Framework

Discovery Workshop

HPE Vertica, HPE IDOL, Hadoop, SAP HANA

Premises Cloud

Discovery Experience

Discovery Production Implementation

Discovery Lab

HPE Servers and Storage

Page 29: Empower Data-Driven Organizations

Rapid, low-risk, securely designed path to big data value delivered as-a-service in the HPE Cloud or on Client premises

ExpertiseHPE data scientists, technology

experts, industry SMEs

Big data platforms

HPE Haven, Hadoop, SAP HANA, etc.

Platform flexibility

On premise or cloud-based

delivery models

Guided processProven processes to accelerate time-

to-value

Use case libraryIndustry and

business function examples

Discovery Production ImplementationOperationalize and monetize the new insights by implementing them into your business processes

Discovery Workshop One to two-day workshop to align business and IT, discuss opportunities and determine priorities

Discovery Experience A private, secure and low risk big data “test-drive” functional and technical environment

HPE Data Discovery Service

Big data infrastructureHPE Moonshot,

HPE Apollo, HPE 3PAR, HPE

ProLiant

Data discovery lab

Rapid deployment of data discovery

labs

Page 30: Empower Data-Driven Organizations

Summary

35

Page 31: Empower Data-Driven Organizations

36

HPE Solution for HadoopBi

g Da

ta

Anal

ytics

RA

HPE Vertica SQL for Hadoop SAP HANA HPE IDOL

Hadoop Reference Architectures for MapR, Hortonworks & Cloudera

HPE Information Governance

HadoopHPE Apollo + Moonshot + ProLiant

HPE Analytics Consulting Services for Hadoop HPE Integration Services

On-Premise and Hybrid Cloud deployment options

Flexible, Purpose-built Infrastructure

High-Performing Analytics Engines

Consulting & Implementation Services

Page 32: Empower Data-Driven Organizations

High performance computing

2x Hadoop performanceor 50% less space

HPE Infrastructure Big Data Reference Architecture

Analyze at scale and speed

100% of your data10x to 1,000x faster

HPE Big Data platformPowered by Vertica & IDOL

Secure and govern

Protect and manageyour data and reputation

HPE Security and GovernanceSolutions for Hadoop

Data management, data discovery and governance services

Build a Data Centric FoundationHadoop for the Enterprise

Page 33: Empower Data-Driven Organizations

Why Hewlett Packard Enterprise?Enterprise Scale with Hadoop

Solution leadership Market leadership Flexible and OpenExperience and expertise

3000+ global analytics and data management professionals

Hundreds of data scientists

Proven analytics and compute platforms for all data, environments, and analytics

Services to deliver value from discovery to achieving business outcomes

Gartner’s Magic Quadrant leader for:

— Enterprise Data Warehouse and Data Management Solutions for Analytics (2015)

— eDiscovery (2015)

Solutions built on open-standards, offering choice and flexibility

Strong strategic alliances complementing HPE solutions

Page 34: Empower Data-Driven Organizations

THANK YOU

39