22
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware

Big Data’s Virtualization Journey

  • Upload
    shaina

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Big Data’s Virtualization Journey. Andrew Yu Sr. Director, Big Data R&D VMware. Big Data: Not Just for the Web Giants – Now the Intelligent Enterprise. Real-time analysis allows instant understanding of market dynamics. - PowerPoint PPT Presentation

Citation preview

Page 1: Big Data’s Virtualization Journey

© 2009 VMware Inc. All rights reserved

Big Data’s Virtualization Journey

Andrew Yu

Sr. Director, Big Data R&D

VMware

Page 2: Big Data’s Virtualization Journey

2

Big Data: Not Just for the Web Giants – Now the Intelligent Enterprise

Page 3: Big Data’s Virtualization Journey

3

Real-time analysis allows instant understanding of

market dynamics.

Retailers can have intimate understanding of their

customers needs and use direct targeted marketing.

Market Segment Analysis Personalized Customer Targeting`

Page 4: Big Data’s Virtualization Journey

4

The Emerging Pattern of Big Data Systems: Retail Example

Real-TimeStreams

Exa-scale Data Store

Parallel DataProcessing

Real-TimeProcessing

MachineLearning

Data Science

Cloud Infrastructure

Analytics

Page 5: Big Data’s Virtualization Journey

5

A single GE Jet Engine produces

10 Terabytes of data in one hour – 90 Petabytes per year.

Enabling early detection of faults, common mode failures, product engineering feedback.

Post Mortem Proactively Maintained Connected Product

Page 6: Big Data’s Virtualization Journey

6

Storage: Plan for Peta-scale Data Storage and Processing

2000 2003 2006 2009 2012 20150.01

0.1

1

10

100

1000

Online AppsAnalytics

PB ofData

Analytics Rapidly Outgrows Traditional Data Size by 100x

Page 7: Big Data’s Virtualization Journey

7

Cloud Infrastructure Supports Mixed Big Data Workloads

MachineLearning HadoopReal-Time

Analytics

Cloud Infrastructure

MachineLearning

Hadoop

Real-TimeAnalytics

Management

Network/Security

Storage/Availability

Compute

Page 8: Big Data’s Virtualization Journey

8

Cloud Infrastructure Supports Multiple Tenants

Cloud Infrastructure

Management

Network/Security

Storage/Availability

Compute

Web UserAnalytics

FinancialAnalysis

Historical CustomerBehavior

Page 9: Big Data’s Virtualization Journey

9

Software-defined Datacenter: Compute

Agility / Rapid deployment

Lower Capex

Isolation for resource control and security

1

2

3

Operational efficiency4

Management

The Core Values of Virtualization Apply to Big Data

Network/Security

Storage/Availability

Compute

Page 10: Big Data’s Virtualization Journey

10

Strong Isolation between Workloads is Key

Hungry Workload 1

Reckless Workload 2

NosyWorkload 3

Cloud Infrastructure

Page 11: Big Data’s Virtualization Journey

11

Consolidation of workloads: Higher Utilization

Hadoop 1

Hadoop 2

HBase

• Without virtualization• independent Hadoop clusters each have access to fraction of total physical resources

• Consolidate and virtualize,- Consolidated cluster has access to entire pool of physical resources - For common use cases, reduce latency on priority jobs on consolidated cluster- Multiple HDFS striped across all physical hosts

Page 12: Big Data’s Virtualization Journey

12

Hadoopbatch analysis

Big Data Mix of Workloads

File System/Data Store

Host Host Host Host Host Host

HBasereal-time queries

NoSQL Cassandra, Mongo, etcBig SQL

Impala,Pivotal HawQ

Computelayer

Virtualization

Host

OtherSpark,Shark,Solr,

Platfora,Etc,…

Page 13: Big Data’s Virtualization Journey

13

Management

Software-defined Datacenter: Storage

Requirements of Next Generation Storage

Network/Security

Storage/Availability

Compute

10x lower cost of storage

Handle explosive data growth

Support a variety ofapplication types

1

2

3

Solve the privacy andsecurity issues4

Page 14: Big Data’s Virtualization Journey

14

Software-defined Storage Enables Fundamental Economics

0.5 1 2 4 8 16 32 64 128 $-

$0.50

$1.00

$1.50

$2.00

$2.50

$3.00

$3.50

$4.00

$4.50

$5.00

$5.50

Cost per GB

Petabytes Deployed

TraditionalSAN/NAS

DistributedObject

StorageHDFSMAPRCEPH

Scale-out NASIsilon, NTAP

Page 15: Big Data’s Virtualization Journey

15

Big-Data using Local Disks

Host

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Servers withLocal Disks

16-24 core server12-24 SATA 2-4TB Disks10 GbE adapteriSCSI/NFS for SharedStorage for vMotion etc,…

High Performance 10GBE Switch per Rack

Page 16: Big Data’s Virtualization Journey

16

Big Data Storage

Scale-out Network Storage

Elastic ComputeScale-out Network Storage

• Hadoop Protocol• Snapshots• Posix Apps• Full NFS Access• Replication• Erasure Coding

Page 17: Big Data’s Virtualization Journey

17

Customer Success: Hadoop as a Service at FedEx

Scale-out Isilon Cluster- Shared Data- NAS + Hadoop

Elastic vSphere Cluster- Mixed Workloads- vSphere- Existing Rack Mount

Servers

Page 18: Big Data’s Virtualization Journey

18

HadoopVirtualNode 2

NN

NN

NN

NN

NN

NN data node

Isilon

Storage Configuration for Data/Compute Separation With Isilon

Virtualization Host

VMDKOS Image – VMDK

Shared storageSAN/NAS

OS Image – VMDK VMDK

VMDK

HadoopVirtualNode 1

Ext4

Job-tracker

Ext4

Temp

OS Image – VMDK

Ext4

Task-tracker

Ext4HadoopVirtualNode 3

Ext4

Task-tracker

Ext4

Page 19: Big Data’s Virtualization Journey

19

Agile Big Data at FedEx

• Trusted Isolation• Well known auditable

platform

Security

• Deploy in minutes• Optimize for shift in

workload characteristics

Agility

• Create true multi-tenancy

• Mixed workloads

Elasticity

Page 20: Big Data’s Virtualization Journey

20

Breakthrough Use Cases

Web Log Analysis Initial exploration was around detection of mobile devices accessing the

website. Analysis of 570 billion web server log entries took approximately 9 minutes to

complete on a small cluster.

ZIP code Analysis Analysis of data to determine which ZIP codes are the highest source or

destination for shipments.

Shipment Analysis Analysis of shipment information to determine patterns

that may delay a package.

Page 21: Big Data’s Virtualization Journey

21

Cloud Infrastructure is Ready for Big Data – Are you?

Cloud Infrastructure

Page 22: Big Data’s Virtualization Journey

22

Q&A