Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
VIRT1351BU
#VMworld #VIRT1351BU
New Architectures for Virtualizing Spark and Big Data Workloads on vSphere
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Justin Murray,Mohan Potheri, Jonathan Flynn
VIRT1351BU
#VMworld #VIRT1351BU
New Architectures for Virtualizing Spark and Big Data Workloads on vSphere
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
1 Introductions
2 Existing and new Approaches in the Big Data World
3 Traditional Deployment Reference Architectures
4 New Architectures – Changing the Paradigm
5Proof of Concept:
Testing in the VMware Solutions Lab
6 vSAN Optimizations
7 Conclusions
#VIRT1351BU CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
Why the Interest in Big Data?
• Enterprises want to get off existing costly data platforms
• Older data warehouse technology is not serving your needs
• Want to do queries and analytics against many different forms of data (structured, unstructured, streaming)
• Provide data access to our end customers
• Integrate systems that have been islands till now
– Single source of truth for the enterprise
• Exploit new application architectures for developer productivity
• Want to do data science, machine learning, deep learning
#VIRT1351BU CONFIDENTIAL 5
VMworld 2017 Content: Not fo
r publication or distri
bution
Worker Node 1 Worker Node 2 Worker Node 3
#VIRT1351BU CONFIDENTIAL
ResourceManager
Client
Datanode
Nodemanager
AppMaster - 1
Nodemanager Nodemanager
Datanode Datanode
HDFS Block 1 HDFS Block 2 HDFS Block 3
Container - 2 Container - 3
Master File System Index
NameNode
submit jobWorkers
Master Scheduler
6
The Existing Hadoop Architecture
VMworld 2017 Content: Not fo
r publication or distri
bution
High Level View of Spark
#VIRT1351BU CONFIDENTIAL 7
VMworld 2017 Content: Not fo
r publication or distri
bution
Worker Node 1 Worker Node 2 Worker Node 3
#VIRT1351BU CONFIDENTIAL
Driver
Job
Executor
JVM
Executor Executor
JVM JVM
Executor
JVM
Executor
JVM
Executor
JVM
8
The Spark Architecture – Standalone
VMworld 2017 Content: Not fo
r publication or distri
bution
NodemanagerNodemanagerNodemanager
Worker Node 1 Worker Node 2 Worker Node 3
#VIRT1351BU CONFIDENTIAL
Job
Datanode
AppMaster - 1
Datanode Datanode
HDFS Block 1 HDFS Block 2 HDFS Block 3
Container - 2 Container - 3
Namenode
Driver Executor Executor
Resourcemanager
9
The Spark Architecture (on YARN)
VMworld 2017 Content: Not fo
r publication or distri
bution
Traditional Reference Architectures
VMworld 2017 Content: Not fo
r publication or distri
bution
vSphereHost Server
HadoopNode 1Virtual Machine
Datanode
Ext4
Nodemanager
Ext4 Ext4 Ext4
Local DAS disks/devices allocated to a Virtual Machine
HadoopNode 2VirtualMachine
Datanode
Ext4
Nodemanager
Ext4 Ext4 Ext4Ext4 Ext4Ext4Ext4
#VIRT1351BU CONFIDENTIAL 11
Two Virtual Machines on a Host Server
VMDK VMDK VMDK VMDKVMDKVMDK VMDK VMDK VMDK VMDKVMDKVMDKVMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1351BU CONFIDENTIAL 12
Data/Compute Separation (with External Access to HDFS)
HadoopVirtualNode 2
NN
NN
NN
NN
NN
NN data n
od
e
Isilon
VirtualizationHost
VMDKOS Image –
VMDKOS Image –
VMDK VMDK
VMDK
HadoopVirtualNode 1
Ext4
ResourceManager
Ext4
Temp
OS Image –
VMDK
Ext4
NodeManager
Ext4
HadoopVirtualNode 3
Ext4
NodeManager
Ext4
Temp
HDFS requests
VMworld 2017 Content: Not fo
r publication or distri
bution
Concerns with HDFS (The Hadoop Distributed File System)
• Difficult to separate compute from data storage concerns
• Three-way block replication for each 256MB data block (or 512MB block)
– Triples input data size at least - to achieve safety
• Re-balance of data when you add new data node processes
• Data must be ingested into HDFS from legacy systems (can be time consuming)
• Site-to-site replication not inherent
• NameNode process (which holds the central index of all files) can be sensitive to higher numbers of small files
#VIRT1351BU CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
Developers and Data Scientists
• Work on their code or on their data analysis model
• Don’t need a multi-tenant cluster
• Don’t care about job scheduling for other users
• Want to scale out to see the effect on their work
• Want to use the latest tools and newer versions (Python, R, Scala, ML kits)
• Experiment with different data models, code, algorithms, data sets
• Training the analysis model is separated from testing it – interested in the time taken for each
• May not need the full Hadoop cluster set
#VIRT1351BU CONFIDENTIAL 14
VMworld 2017 Content: Not fo
r publication or distri
bution
New Architectures for Big Data
VMworld 2017 Content: Not fo
r publication or distri
bution
Legacy Big Data in VMware Virtualized Environments
• Dedicated clusters were recommended with many local disks for HDFS
• VMware Shared storage not truly scalable for HDFS purposes
• VMware HA, vMotion and DRS are not available with recommended configuration
• Many of the benefits of VMware Virtualization are under utilized
• Storage and Compute are still linked and cannot scale independently
16
VMworld 2017 Content: Not fo
r publication or distri
bution
Key Trends in Big Data Infrastructure
• Decoupling of Compute and Storage Clusters
• Dynamic Scaling of compute nodes used for analysis from dozens to hundreds
• SPARK and other newer Big Data platforms can work with regular filesystems
• Newer platforms store and process data in memory
• New platforms can leverage Distributed Filesystems that can use local or shared storage
• Need for High Availability & Fault Tolerance for master components
17
VMworld 2017 Content: Not fo
r publication or distri
bution
Separate Compute from Storage
• You will see this separation in Amazon deployments with S3 used for storage and EC2 instances for compute
• Achieving the same effect with similar distributed file systems
– Separate compute virtual machines from storage VMs
– Data is processed and scaled independently of compute
18
VMworld 2017 Content: Not fo
r publication or distri
bution
HDFS replacement needed for the next generation distributed file System
• What candidates present themselves?
– S3, Ceph, Gluster, etc.
• GlusterFS used in POC:
– Mature Solution
– Native GlusterFS filesystem for Linux
– Layers on top of any traditional storage
– Truly distributed and resilient distributed file system
– Supports many common client protocols
19
VMworld 2017 Content: Not fo
r publication or distri
bution
GlusterFS
20
• GlusterFS is a scale out distributed filesystem that can support thousands of clients
• File-system can run on DAS or Shared Storage
• Fault Tolerant Distributed File System.
• Provides multiprotocol support
– Native
– NFS
– CIFS
– HDFS
– S3
– FTP
•
https://www.slideshare.net/shubhendutripathi040980/glusterfs-hadoop
VMworld 2017 Content: Not fo
r publication or distri
bution
HDFS vs Ceph vs Gluster IOZONE Performance Comparison
21
http://iopscience.iop.org/article/10.1088/1742-6596/513/4/042014/pdf
VMworld 2017 Content: Not fo
r publication or distri
bution
SPARK with GlusterFS POC Architecture on Pure FC SAN
VMware vSphere VMware vSphere VMware vSphere VMware vSphere
Spark
Master
Spark
Worker
Spark
Worker
Spark
Worker
Spark
WorkerSpark
Worker
Spark
Worker
Spark
Worker
Gluster
Node
Gluster
Node
Gluster
Node
GlusterFS
Pure M50 Storage on Fibre-Channel
Spark
Worker
VMworld 2017 Content: Not fo
r publication or distri
bution
SPARK with GlusterFS POC Architecture on Virtual SAN
VMware vSphere +
VSAN
VMware vSphere +
VSAN
VMware vSphere +
VSAN
VMware vSphere +
VSAN
Spark
Master
Spark
Worker
Spark
Worker
Spark
Worker
Spark
WorkerSpark
Worker
Spark
Worker
Spark
Worker
Gluster
Node
Gluster
Node
Gluster
Node
GlusterFS
Clustered VSANDatastore
Spark
Worker
VMworld 2017 Content: Not fo
r publication or distri
bution
TPC-DS on SPARK on GlusterFS
VMworld 2017 Content: Not fo
r publication or distri
bution
TPC-DS with Spark-SQL and Apache SPARK
• IBM has helped integrate the TPC-DS Benchmark (v2), into the spark-sql-perf
• The 99 queries were generated using the TPC-DS query generator and are based on the 100-GB scale factor.
• The spark-sql-perf test kit can be used to evaluate and compare the infrastructure for its performance.
• We leveraged a subset of TPC-DS queries to evaluate our POC and Solution
25
VMworld 2017 Content: Not fo
r publication or distri
bution
Test Setup
• SPARK Nodes:
– 1 Master and 8 Slave Nodes with 16 vCPU and 128 GB each
– 3 Node GlusterFS cluster with 2 TB shared Filesystem mount across all SPARK nodes
• Storage: (Two Use Cases)
1. GlusterFS backed by Pure Storage LUNS (16 GBPS FC Fabric with Pure M50 Array)
2. GlusterFS backed by vSAN (Western Digital NVMe Cache, High Capacity Flash for persistence)
• TPC-DS Data Sets
– 5 TB
• Queries
– Interactive TPC-DS Queries Set (q19, q42, q52, q55, q63, q68, q73 & q98)
26
VMworld 2017 Content: Not fo
r publication or distri
bution
Apache SPARK Web Console
27
VMworld 2017 Content: Not fo
r publication or distri
bution
SPARK Job Details
28
VMworld 2017 Content: Not fo
r publication or distri
bution
TPC-DS test results ( 5TB Data Set)
29
0
0.5
1
1.5
2
2.5
3
q19 q42 q52 q55 q63 q68 q73 q98
Query Time Comparison between FC SAN and vSAN
Pure VSAN
VMworld 2017 Content: Not fo
r publication or distri
bution
Section-Conclusion
• Modern Big Data platforms like SPARK are mostly memory resident
• GlusterFS provides a high performance distributed filesystem for SPARK and newer big data workloads
• GlusterFS supports a wide range of protocols that make it the ideal storage platform for data lakes
• Layering GlusterFS on top of shared storage or VSAN helps leverage all the vSphere platform features
• Dedicated HW with local storage is no longer required for modern big data applications.
• TPC-DS testing showed similar performance for SPARK-SQL on VSAN and FC.
30
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Optimization
VMworld 2017 Content: Not fo
r publication or distri
bution
Hardware Configuration
All-Flash vSAN
• (4) Node Dell™ R730XD
– (2) E5-2699V4 – 22-core 2.2GHz
– 1TB Memory
– (4) 10 Gb/s Ethernet connections
– PERC H730mini
– SDCard System Drive
– vSphere 6.5 Update 1
• VSAN disk configuration
– (2) Disk groups per node
• (1) 1.6TB* Ultrastar SN100 cache drive
• (2) 3.84TB Optimus MAX capacity drive
* 1TB=1,000GB, 1GB=1,000,000,000 bytes. Actual usable capacity less.
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Disk Group Configuration
33
VMworld 2017 Content: Not fo
r publication or distri
bution
Virtual
Switch
vSAN - Network
• These are not necessarily for redundancy (like an “Air-Gap” network with redundant physical interfaces routed to multiple VMKs) but for performance to pull from two physical interfaces at once.
Dual vSAN VMKernel Adapters
Port
GroupPort
Group
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN VMK Configuration
35
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Port Group Uplink Maps
• vDS Contained 4 Uplinks
– 2 dedicated to normal operation
– 2 dedicated to vSAN communication
• vDS-Comp01-Private
– Active Uplink: dvUplink3
– Standby Uplink: dvUplink4
• vDS-Comp01-Private2
– Active Uplink: dvUplink4
– Standby Uplink: dvUplink3
36
VMworld 2017 Content: Not fo
r publication or distri
bution
HCIBench – Results – Network
0
0.5
1
1.5
2
2.5
3
3.5
4
0
100000
200000
300000
400000
500000
600000
700000
4K 8K 32K 64K
MS
IOP
s
Block Size
Baseline Multiple vSAN VMK 1500 MTU 10Gb Ethernet 10Gb Eth Multiple vSAN VMK
Baseline - Lat Multiple vSAN VMK - Lat 1500 MTU - Lat 10Gb Ethernet - Lat 10Gb Eth Multiple vSAN VMK - Lat
100% Read IOPs and LatencyvSAN 6.6.1
™
VMworld 2017 Content: Not fo
r publication or distri
bution
What Have We Seen so Far?
• We can use a different file system for big data to HDFS
• With the right storage, we can use the vMotion/DRS/HA/FT features of vSphere
• VSAN can provide the storage underpinning big data (particularly for newer workloads)
• A number of different workloads were exercised on this new architecture
– Analytical queries, batch jobs and machine learning
• Testing is still in progress on all the above – more to come
#VIRT1351BU CONFIDENTIAL 38
VMworld 2017 Content: Not fo
r publication or distri
bution
Conclusions
• New architectures for big data are emerging beyond the existing documented ones
• Spark changes the profile of I/O and persistence for the newer applications
• This lends itself well to virtualization and separation of compute from data
• Traditional values in vSphere can be used in a big data context
• We would like to explore how these new architectural ideas will fit in your environment
#VIRT1351BU CONFIDENTIAL 39
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
BACKUP SLIDES – NOT FOR PRESENTATION
VMworld 2017 Content: Not fo
r publication or distri
bution
Placeholder : Key Requirements for Big Data Architecture
• Performance
• Scaling
– to dozens or hundreds of nodes (VMs)
• Robustness – distributed file system, no one process is a single point of failure
• High Availability
• Fault Tolerance
• Capable of handling new workloads with new compute demands
#VIRT1351BU CONFIDENTIAL 43
Subtitle
VMworld 2017 Content: Not fo
r publication or distri
bution
Placeholder : Key Requirements for Big Data Architecture
• Can we use a distributed file system that is not HDFS?
• Use a lighter weight framework than full Hadoop – e.g. Spark?
• Can we keep as much data in memory as possible and avoid I/O? Avoid spills
• Are shared file systems like VSAN useful?
• How to achieve the performance requirements without losing functionality?
#VIRT1351BU CONFIDENTIAL 44
VMworld 2017 Content: Not fo
r publication or distri
bution
One Test Workload:Introduction to Machine Learning
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1351BU CONFIDENTIAL 46
VMworld 2017 Content: Not fo
r publication or distri
bution
• Machine Learning algorithms try to make predictions based on training data that is given to a mathematical model (e.g. a linear regression algorithm)
• Find the minimum the difference between the model’s prediction and the already known outcomes (minimize the loss or objective function)
#VIRT1351BU CONFIDENTIAL 47
New Sample
Transaction Data
Training Data (Big) Mathematical ModelClassification or
Prediction
Mathematical ModelMathematical Model
training
Samples from History
testing
What Is Machine Learning?
VMworld 2017 Content: Not fo
r publication or distri
bution
• Training data contains many features that have each been given a numeric value (e.g. zip code = 99)
• Several models are used against the training data and the best one is chosen (minimal loss or error)
• One kind of outcome is a binary classification (a good credit application or bad)
#VIRT1351BU CONFIDENTIAL 48
Example: Machine Learning Model for “A Customer Applies for Credit”
A new application
for credit
Training Data (Big) Mathematical ModelClassification or
Prediction
Mathematical ModelMathematical Model
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1351BU CONFIDENTIAL 49
Acct
Number
Txn
ID
Txn
Location
Code
Age Home
Zip
Code
Balance Annual
Salary
Passed
Valid
Check
Model’s
Estimate
as Valid
Error
(Loss)
1234 45 94312. 21 94304 100 80 Y N 1
5678 89 UK 31 12116 5000 110 N Y 1
9012 150 12126 61 31024 1400 50 Y Y 0
Knowns Computed/Learned
Examplesxi
Features or Feature Variables
Training Data
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1351BU CONFIDENTIAL 50
Acct
Number
Txn
ID
Txn
Location
Code
Age Home
Zip
Code
Balance Annual
Salary
Passed
Valid
Check
Model’s
Estimate
as Valid
Error
(Loss)
1234 45 94312. 21 94304 100 80 Y N 1
5678 89 UK 31 12116 5000 110 N Y 1
9012 150 12126 61 31024 1400 50 Y Y 0
Known Computed/Learned
Examplesxi
Features or Feature Variables
GOLDEN RULE : Don’t TEST on your TRAINING DATA
Test Data
Training Data
Test Data Should Always Be Separated from Training Data
VMworld 2017 Content: Not fo
r publication or distri
bution
#VIRT1351BU CONFIDENTIAL 51
f (xi, W, b) = Wxi + b
Source: Stanford University class cs231nx: Example data
W: weights
b: bias
Example: A Linear Classifier
VMworld 2017 Content: Not fo
r publication or distri
bution
• Spark is the runtime platform for the models and ingestion of the training data
• Different Machine Learning algorithms available from MLlib library that comes with Spark
• Application and Data is distributed out to many nodes (virtual machines)
#VIRT1351BU CONFIDENTIAL 52
SparkSpark
Spark
A new application
for credit
Training Data (Big) Mathematical ModelClassification or
Prediction
Mathematical ModelMathematical Model
Deployment Platform for Machine Learning
VMworld 2017 Content: Not fo
r publication or distri
bution