Upload
dinhkhanh
View
218
Download
1
Embed Size (px)
Citation preview
Vahid Fereydouny, Sr Product Line Marketing Manager
Sachin Sundar, Product Line Marketing Manager
STO3192BU
#VMworld #STO3192BU
Deploying Big Data on HCI Powered by vSAN
1
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
#STO3192BU CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
1 vSAN overview and use cases
2 Big Data overview and use cases
3 Big Data on vSAN Performance Assessment
4 MongoDB on vSAN
5 Splunk on vSAN
6 Summary and Q&A
3
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Overview & Use Cases
4
VMworld 2017 Content: Not fo
r publication or distri
bution
5© 2017 VMware Inc. All rights reserved. Confidential – Not for Distribution
HCI Powered by vSAN Overview
3-Tiered
Architecture
Built on proprietary hardware
Virtualization
Compute
Storage Networking
Storage
Hyper-Converged
Infrastructure
Built on industry-standard hardware
Virtualization
Compute
Storage
Networking
Management
VMworld 2017 Content: Not fo
r publication or distri
bution
Supporting a Broad Variety of Use Cases
Business Critical Apps Virtual Desktops (VDI)
DR / DA
Cloud Native AppsDatabases
(SQL/Oracle)
ROBOManagement
Clusters
ContainersvSAN
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Is Used For Mixed WorkloadsWhat applications do you run on vSAN today?
21%VMware Horizon
Citrix Xen Desktop
VDI
10%
Microsoft SharePoint
Microsoft Exchange Server
Microsoft Applications
15%
26%
NoSQL databases (Cassandra, etc.)
Hadoop and other big data applications
NewUse Cases
9%
3%
Source: TechValidate survey of 316 users of VMware vSAN
Microsoft SQL Server
MySQL Databases
Oracle Databases
SAP
Databases
67%
38%
18%
7%
Growing trend for customers to deploy their
big data on vSAN
VMworld 2017 Content: Not fo
r publication or distri
bution
Tiered All-Flash and Hybrid Options Provides Choice
8
Caching
DataPersistenceVirtual SAN
All-Flash
100K IOPS per Host+
sub-millisecond latency
Writes cached first,Reads from capacity tier
Capacity TierFlash Devices
Reads primarily from capacity tier
SSD PCIe NVMe
Hybrid
40K IOPS per Host
Read and Write Cache
Capacity TierSAS / NL-SAS / SATA
SSD PCIe NVMe
VMworld 2017 Content: Not fo
r publication or distri
bution
All-in-one HCI Appliance
Dell EMC best-in-breed data protection
Rapid time-to-value with multiple configurations
Single, pro-active vendor support for software and hardware
Fully Customizable HCI
Choice of 5x more server vendors
Software and support flexibility
Backup agnostic to minimize change
HCI Deployments Provide Choice
Dell EMC VxRail Appliances
VMware vSAN ReadyNodes
VMworld 2017 Content: Not fo
r publication or distri
bution
Big Data overview and use cases
10
VMworld 2017 Content: Not fo
r publication or distri
bution
Big Data Analytics – Simplified Landscape
11
Analytics & Visualization
Data Preparation
Big Data Infrastructure Compute Network Storage
Big Data Platform
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Big Data Use Cases
12Edge/On-Premises
Cloud OrOn-Premises
Data Collection
Analytics & Machine Learning
SupportDevice/Customer 1
Device/Customer N
Customer
.
.
. Sales/Marketing
VMworld 2017 Content: Not fo
r publication or distri
bution
Some of the Major Customer Challenges in Adoption of Big Data are Related to Infrastructure
13
Some of the Top Pain Points are related to Infrastructure
• Security, scalability, manageability, and Performance top the list of infrastructure pain point
12
4
1
4
1
4
2
17
19
21
16
16
12
14
Security
Data movement across platforms
Better analytic tools
Performance
Easy to scale in/out
Easy to manage
Better reporting tools
Top 1 Pain Point Top 7 Pain Points
Top 10 Pain Points for Hadoop Workloads
n= 43
VMware Internal Focus Groups43 Responses from different companiesBased on 6 Focus Groups in Europe and U.S
VMworld 2017 Content: Not fo
r publication or distri
bution
Big Data Implementations in Silo are Adding Major Cost and Complexity
14
On-Prem
Public Cloud
On-Prem On-Prem Amazon GoogleOther
Google Cloud Platform
…Physical Servers
On-Premises
VMworld 2017 Content: Not fo
r publication or distri
bution
Public Cloud
Our Long Term Vision is to Break Those Silos
15
Storage & Availability
…
Hyper-Converged Software
VMworld 2017 Content: Not fo
r publication or distri
bution
Major Benefits of Virtualizing Big Data
16
Simplified
Management
Centralized data center management
Apply virtualization best practices
EfficiencyResource pooling
Server and cluster consolidation
AgilityInfrastructure on demand
Sharing of physical resources – not dedicated clusters
PerformanceEqual to, or better performance than native Hadoop
No significant overhead
VMworld 2017 Content: Not fo
r publication or distri
bution
Big Data on vSAN + Intel Hardware
17
VMworld 2017 Content: Not fo
r publication or distri
bution
Worker Node 1 Worker Node 2 Worker Node 3
The Existing Hadoop Architecture
ResourceManager
Client
Datanode
Nodemanager
AppMaster - 1
Nodemanager Nodemanager
Datanode Datanode
HDFS Block 1 HDFS Block 2 HDFS Block 3
Container - 2 Container - 3
Master File System Index
NameNode
submit job
Workers
Master Scheduler
VMworld 2017 Content: Not fo
r publication or distri
bution
Worker Node 1 Worker Node 2 Worker Node 3
Input File
Hadoop – in Virtual Machines
ResourceManagerJob
Datanode
Nodemanager
Split 1 – 64MB
AppMaster - 1
Split 2 – 64MB
Split 3 – 64MB
Nodemanager Nodemanager
Datanode Datanode
Block 1 – 64MB Block 2 – 64MB Block 3 – 64MB
Container - 2 Container - 3
Namenode
Master Roles
VMworld 2017 Content: Not fo
r publication or distri
bution
Hadoop Deployment – Virtualized on vSphere + vSAN
vSAN Datastore
…
vSphere + vSAN
HDFS+YARN
SQLIn-Memory Map-Reduce NoSQL Stream Search Custom• Security: Native in vSphere and
vSAN
• Scalability: Scale-up/Scale-Out Architecture, Mixing of workloads in the same cluster
• Manageability: Standardization, Fine grained policy management
• Performance:
✓ vSAN Architecture
✓ All-Flash HardwareVMworld 2017 Content: N
ot for publicatio
n or distribution
1
3Superior operating experience
4
Why Big Data on All Flash vs HDDs?
2
CapEx Savings! • Significant reduction in SSD Prices• Cost/GB is at par with HDDs
• fewer drives• No moving parts and Smaller form factors• Up to 5X better MB/Watt5
$
Superior Performance• Up to 13x better transaction throughput1• Up to 200x better read performance2• Write IOPs up to 95x better
Innovation• Standardized interface for PCI Express® SSDs • Capacity: 3D NAND• Ultra high Performance: 3D XPoint™
VMworld 2017 Content: Not fo
r publication or distri
bution
Test Server Configuration
26
2S Intel® Xeon® E5-2690 v4
14 Cores/28 Threads
(35M Cache, 2.60 GHz)
256 GB (16x 16GB) DDR4 2133
MHz RAM
2x Intel® Ethernet Server
Adapter SFP+ X520-DA2 (2 x
10 Gbps)
Dual Port, PCIe v2.0 (5.0GT/s),
x8 Lane
Boot: Intel® DC S3610 Series
Cache: 4x Intel® DC P3700 800GB
Capacity: 12x Intel® DC S3610 1.6TB
2x LSI 3008-8i 12Gbps RAID
controllers
Intel® Server Board S2600WTTR System
7
Intel® Server Products for Cloud – vSAN* Ready Nodes
http://www.intelserveredge.com/intel-cloud-block-vsan/
VMworld 2017 Content: Not fo
r publication or distri
bution
vSphere and vSAN Host Configuration
27
1
2
7
ESXi 6.0.0 Update 2
Arista 7150S-64 SFP+
(10GbE)
vSwitch 2
2x10GbE (bonded)
VM Network
vSwitch1
2x10GbE (bonded)
vSAN Network
Mgmt. vLAN
vCenter Server
Appliance
vSwitch 1
1GbE
Mgmt Network
Max 4 Disk Groups Per Host
1 x Intel® DC P3700 800GB 3 x Intel® DC S3610 1.6TB
Caching Tier Capacity Tier
Number of Disk Stripes: 3 Disk
Number of Failures to Tolerate: 0
4 VMs Per Host
CentOS 6.7 x86_64
14 vCPU, 48GB vRAM
OS: 1x40GB LSI Logic
(Thin Provision)
Data: 4x400GB LSI Logic
VMworld 2017 Content: Not fo
r publication or distri
bution
Hadoop Configuration…
28
CDH Management
AP ES HM SM
S
Name Node
NN
G
S
JH
SRM
B
400GB/vol
41
CDH vols
40GB (Thin)
Boot vol
400GB/vol
41
NN vols
40GB (Thin)
Boot vol
Data Node 1
SNN
NM
S
DN
400GB/vol
41
Data vols
40GB (Thin)
Boot vol
NM
DN
400GB/vol
41
Data vols
40GB (Thin)
Boot vol
Data Node 2-27
vSwitch 2 - 2x10GbE (bonded) VM Network
HDFS Services (NN – Name Node, SNN – Secondary Name Node, DN – Data Node)
YARN Services (G – Gateway, RM – Resource Manager, JHS – Job History Server, NM – Node Manager)
Zookeeper Services (S – Server)
Legend
Cloud Mgmt Service (AP – Alert Publisher, ES – Event Server, HM – Host Monitor, SM – Service Monitor)
Cloudera Hadoop - 5.7.0
VMworld 2017 Content: Not fo
r publication or distri
bution
… Hadoop Configuration Details
29
CDH Tuning
Parameter Value
dfs.blocksize 256 MiB
dfs.replication 3
dfs.client.use.datanode.hostname TRUE
mapreduce.task.io.sort.mb 400 MiB
yarn.scheduler.minimum-allocation-mb 2 GiB
mapreduce.map.memory.mb 2.1 GiB
mapreduce.reduce.memory.mb 2.1 GiB
mapreduce.map.cpu.vcores 1
mapreduce.reduce.cpu.vcores 1
mapreduce.job.heap.memory-mb.ratio 0.8
Test Scenarios
Linux tuning (see backup for details)
CDH Tuning (table on left)
Tera Benchmark Suite (1TB+)
Different vSAN configs – disk groups,
FTT, host affinity, etc.
Identify optimal vSAN
system configuration
for analytics
workloads
Goals
VMworld 2017 Content: Not fo
r publication or distri
bution
Workloads - MapReduce
30
TeraSort Suite
– Most popular Hadoop test, supplied with distribution, exercises CPU, memory, disk,
network
– TeraGen – generates specified number of 100 byte records – 1, 3, and 5 TB used in
tests
– TeraSort – sorts TeraGen output
– TeraValidate – validates TeraSort output is in sorted order
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Results - Disk Groups (DGs)
31Having more disk groups spread IO and faster response times
vSAN FTT=1, Hadoop ds.rep=2
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Results – Host Affinity (Tech Preview)
32Host affinity avoids replicas on the same host, provides faster response time
4DGs, vSAN FTT=0, Hadoop dfs.rep=3
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices on Running Hadoop on vSAN
33
• 4 VMDKs per VM helps spread IO across SCSI controllers for uniform I/O
distribution
• HDFS, vSAN provide fault tolerance with replication and erasure codes
• vSAN host affinity option (Tech Preview) provides data locality and best
performance
• Using multiple disk groups helps IO performance
CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
MongoDB on vSAN
VMworld 2017 Content: Not fo
r publication or distri
bution
35
MangoDB: Accommodates Large Volumes of Rapidly Changing Structured, Semi-structured and Unstructured Data
Overview• MongoDB is an open source document
database• NoSQL DB• Built on an architecture of collections
and documents• Documents comprise sets of key-value
pairs and are the basic unit of data in MongoDB
• Collection: A grouping of MongoDB documents
VMworld 2017 Content: Not fo
r publication or distri
bution
Solution Overview and Test Environment
CONFIDENTIAL 36
Test Environment
• vSAN 6.6 and vSphere 6.5.0d• CentOs: 7.3• MangoDB 3.4 (community version)• Yahoo Cloud Serving Benchmark (YCSB)
0.12.0
MongoDB Definitions
• ConfigDB stands for the configuration database for MongoDB cluster’s internal use
• Mongos: “MongoDB Shard” ( Routing Services )
• Mongod: primary process for handing data requests, manage data access, and perform background management ops
VMworld 2017 Content: Not fo
r publication or distri
bution
Solution Configuration: Hardware and Software
37
Component Specification
Server SuperMicro SSG-2027R-AR24NV
CPU cores 2 sockets, 10 cores 3.0GHz with hyper-threading enabled
RAM 512GB DDR4 RDIMM
Network adapter 2 x Intel 10 Gigabit X540-AT2, + I350 1Gb Ethernet
Storage adapter 2 x 12Gbps SAS PCI-Express
Disks SSD: 2 x 3,000GB NVMe drive as cache SSD SSD: 8 x 400GB SATA drive as capacity SSD
Hardware
Component CPU
CoresMemory OS Disk
Data
Disk
ConfigDB ( 3 instances) 8 32GB 32GB 200GB
Mongos 8 64GB 32GB None
Mongod 8 64GB 32GB 200GB
Mongos: The routing service in MongoDBMongod: The daemon for data processingBaseline: 100 Million records ( 128 GB )
Performance Analysis
The rule of thumb is that the aggregated CPU cores and memory should not exceed the physical resources.
VMworld 2017 Content: Not fo
r publication or distri
bution
YCSB Workload Types for Performance Evaluation
50% Read
50% Update
Workload A
95% Read
5% Update
Workload B
38
YCSB: NoSQL DB performance assessment tool (open source)
Performance Analysis
Update heavy workload Read mostly workload
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Evaluate the Impact of Different Client Threads – Optimal to be 128
CONFIDENTIAL 39
Performance Analysis
Workload A Workload B
Using 128 client threads leads to a maximum performance and keeping the latency lower than that with higher threads
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Evaluate the Impact of Different YCSB Operation Count – Performance at Steady State ~25M
CONFIDENTIAL 40
Performance Analysis
Workload A Workload B
We decided to use 25M ops count across our tests to be consistent.Change this based on your requirements
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Parameter Settings in the Baseline Testing
PARAMETER VALUE
Monogd data server number 4
Mongod data disk size 200GB
Mongod CPU cores number 8
Mongod memory size 64GB
Mongos CPU cores number 8
Mongos memory size 64GB
Enable Mongod replica set? No
vSAN stripe width setting 1
vSAN FTT 1
vSAN object checksum Disabled
YCSB client threads 128
Database entry size 100 million
Operation count 25 million
MongoDB durability ‘w’ option 1 (true)
MongoDB durability ‘j‘ option 1 (true)
CONFIDENTIAL 41
Performance Analysis
Performance baseline without any optimization• Workload A, the ops/sec value was 28,529
with average read latency 0.64ms and average update latency 8.3ms.
• Workload B, the ops/sec value was 119,554 with average read latency 0.81ms and average update latency 5.7ms.
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Evaluate the Impact of Different Virtual CPU Cores and Memory Configurations
CONFIDENTIAL 42
Performance Analysis
Workload A Workload B
Increasing Memory/CPU could have major impact on performance
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Evaluate the Impact of Different Database Size in Terms of Entries
CONFIDENTIAL 43
Performance Analysis
Workload A Workload B
For the larger size DBs increase Memory/CPU to avoid performance penalties( This result is based on 8 CPU cores and 64GB memory for the MongoDB servers )
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Evaluate the Impact of MongoDB Replica Set Setting
CONFIDENTIAL 45
Performance Analysis
Workload A Workload B
(rs=1 means no application replication and rs=3 means application replication)Turning off MongoDB’s replica set provides better performance and lower latency
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Evaluate the Impact of Different vSAN Object Stripe Width
CONFIDENTIAL 46
Performance Analysis
Increasing Stripe Width could positively impact performance, however the impact is not very high. Recommended only when performance is low
Workload A Workload BVMworld 2017 Content: N
ot for publicatio
n or distribution
Performance Testing: Evaluate the Impact of Different vSAN RAID Levels
CONFIDENTIAL 47
Performance Analysis
Workload A Workload B
RAID 5 could save storage space but also would lead to a lower throughput and higher latency. Users should consider the tradeoff between storage space and performance.
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices on Running MongoDB on vSAN
• Before the deployment: Size the environment properly
• Use MongoDB shards
• Optionally turn off MongoDB’s replica set and leverage vSphere HA instead
• Appropriate CPU and memory size is essential
• Appropriate data durability option to trade-off between performance and availability
• Try larger vSAN stripe width if low performance
• Follow the MongoDB best practices
CONFIDENTIAL 52
Performance Analysis
VMworld 2017 Content: Not fo
r publication or distri
bution
Apache Cassandra on vSAN
• DataStax delivers Apache Cassandra in a database platform purpose-built for the performance and availability demands of Web, Mobile, and IOT applications
• Solution Overview—DataStax Enterprise on vSAN at HERE
CONFIDENTIAL 53
VMworld 2017 Content: Not fo
r publication or distri
bution
Splunk on vSAN
54
VMworld 2017 Content: Not fo
r publication or distri
bution
Splunk Enterprise on vSAN
➢ vSAN - Storage for
hot/warm/cold buckets
➢ Hardware
• 1 VxRail cluster
• All-flash SSDs storage
➢ Application layer
• Dedicated VM for Splunk
VMworld 2017 Content: Not fo
r publication or distri
bution
Splunk Enterprise on vSAN and Isilon
➢ vSAN + Isilon
• vSAN for Splunk hot/warm
buckets
• Isilon for Splunk cold
buckets ( Isilon X410 )
➢ Hardware
• 1 VxRail cluster
• All-flash SSDs storage
• 1 Isilon cluster
VMworld 2017 Content: Not fo
r publication or distri
bution
Splunk-validated Sizing Configurations
Departmental
deployment
Small enterprise
distributed
deployment
Medium enterprise
distributed
deployment
Medium enterprise
indexer cluster
deployment
Data volume 50GB/day 500GB/day 1 TB/day 1 TB/day
Retention 90-day 90-day 90-day • 7-day retention
for hot/warm
• Configurable
retention for cold
buckets on Isilon
Concurrent users Less than 8 Less than 64 More than 64 More than 64 VMworld 2017 Content: N
ot for publicatio
n or distribution
One VxRail node for up to 50 GB/day with 90-day retention
Departmental Deployment Validated Sizing Configuration
Hardware Configuration
Deployment Configuration
Performance Analysis
VxRail Model Specification # of Nodes Storage Required
VxRail E460F 40 x 2.2GHz cores
384GB (24 x 16GB) RAM
1 Disk Group with 800GB
Cache SSD
5.235TB (3 x 1.92TB SSD) raw
capacity
1 3.3 TB**
(includes space for OS and
20% reserved for free space)
Instance Role QtyPhysical
Cores/vCPUsMemory OS Storage Indexer Storage
Single Instance 1 32/64 256GB 300GB 3TB
VMworld 2017 Content: Not fo
r publication or distri
bution
Four VxRail nodes for up to 500 GB/day (distributed) or up to 250 GB/day (clustered) with 90-day retention
Small Enterprise Distributed Deployment
Instance Role Qty Physical Cores/vCPUs Memory OS Storage Indexer Storage
Search Head 1 32/64 256GB 300GB 0
Indexer 2 32/64 256GB 300GB 13.9TB
Admin Server 1 32/64 256GB 150GB 0
VxRail Model Specification # of Nodes Storage Required
VxRail E460F
(per node)
40 x 2.2GHz cores
512GB (16 x 32GB) RAM
2 Disk Groups, each with 800GB Cache SSD
20.94 TB (6 x 3.84TB SSD) raw capacity
4
VxRail Cluster 83.8 TB raw capacity
40.3 TB effective usable capacity
27.8 TB
(includes space for OS and 20% reserved for
free space)
Hardware Configuration
Deployment Configuration
Performance Analysis
VMworld 2017 Content: Not fo
r publication or distri
bution
Seven VxRail nodes for up to 1 TB/day (distributed) with 90-day retention
Medium Enterprise Distributed Deployment Validated Sizing Configuration
Instance Role Qty Physical Cores/vCPUs Memory OS Storage Indexer Storage
Search Head 1 32/64 256GB 300GB 0
Indexer 5 32/64 256GB 300GB 10.8TB
Admin Server 1 32/64 256GB 150GB 0
VxRail Model Specification # of Nodes Storage Required
VxRail E460F
(per node)
40 x 2.2GHz cores
384GB (24 x 16GB) RAM
2 Disk Groups, each with 800GB Cache SSD
20.94 TB (6 x 3.84TB SSD) raw capacity
7
VxRail Cluster 146.6 TB raw capacity
70.7 TB effective usable capacity
56.4 TB
(includes space for OS and 20% reserved for
free space)
Hardware Configuration
Deployment Configuration
Performance Analysis
VMworld 2017 Content: Not fo
r publication or distri
bution
Seven VxRail nodes with Isilon for up to 1 TB/day (clustered) with 7-day retention for hot/warm buckets and configurable retention for cold buckets
Medium Enterprise Indexer Cluster Deployment
Instance Role Qty Physical Cores/vCPUs Memory OS Storage Indexer Storage
Search Head 1 32/64 256GB 300GB 0
Indexer 5 32/64 256GB 300GB 2.1TB
Admin Server 1 32/64 256GB 150GB 0
VxRail Model Specification # of Nodes Storage Required
VxRail E460F
(per node)
40 x 2.2GHz cores
384GB (24 x 16GB) RAM
1 Disk Group with 800GB Cache SSD
5.235TB (3 x 1.92TB SSD) raw capacity
7
VxRail Cluster 36.6 TB raw capacity
16.1 TB effective usable capacity
12.5 TB
(includes space for OS and 20% reserved for
free space)
Hardware Configuration
Deployment Configuration
Performance Analysis
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Linear Scalability of vSAN Cluster size for Various Workloads
All Read Workload (4KB) Sequential R/W Workload (256KB)
Performance Analysis
Shows the scalability test result for throughput and latency for All Reads and Sequential Read/Write workloads. The testing results show that vSAN is close- to-linear scalability.
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Testing: Linear Scalability of vSAN Cluster Size for Mixed Workloads
Mixed R/W Workload (4KB) Mixed R/W Workload (32KB)
Performance Analysis
Workload A Workload B
Shows the scalability test result for latency and throughput of the mixed workloads. The testing results show that vSAN has close-to-linear scalability.
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices for Running Splunk on vSAN
• For performance use vSAN and if you need to store a large amount of cold data use vSAN+Isilon
• Increase the number of disk stripes as needed for highest performance
• Run proactive disk rebalance after add new nodes
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary and Call to Action
65
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary and Call to Action
• Consolidation and simplified management driving the need for deploying Big Data
workloads on vSAN
• All Flash vSAN Ready Nodes deliver cost efficient performance, reliability and
hassle free manageability
• Consider evaluating “All Flash” vSAN for Big Data workloads today!!
VMworld 2017 Content: Not fo
r publication or distri
bution
Additional Resources
• White Paper - Intel and VMware White Paper : New Era of Hyper-Converged Big Data Using
Hadoop with All-Flash VMware VSAN
• Using Splunk Enterprise With VxRAIL Appliance and Isilon for Aanlysis of Machine Data
• Blogs - Big Data on All-Flash vSAN? Of Course!
• Storage Hub - Mongo DB on VMware vSAN
VMworld 2017 Content: Not fo
r publication or distri
bution
Big Data Sessions
Tuesday, 29th August 2017
• [VIRT1997BU] Machine Learning and Deep Learning on VMware vSphere: GPUs are Invading the Software-
Defined Data Center: 5:30 -6:30pm
• [VIRT2274GU] Group Discussion on Virtualizing Big Data and Machine Learning : 5:30-6:30pm
Wednesday, 30th August 2017
• [LDT2800PU] Harnessing the Power of Data in a Virtual World: 9:30 – 10:30am
• [MTE4789VIRT] Meet the Experts Session : 11:15am-12:00pm – Table 7
• This is an opportunity to meet with VMware’s big data people in a small group context. Booking your time-slot
ahead of the meeting is advised here.
Thursday, 31st August 2017
• [VIRT1445BU] Extreme Performance Series: Fast Virtualized Hadoop and Spark on All-Flash Disks : 10:30-
11:30am
VMworld 2017 Content: Not fo
r publication or distri
bution
3 Easy Ways to Learn More about vSAN
69
• Live at VMworld
• Practical learning of vSAN, VxRail and more
• 24x7 availability online– for free!
vSAN Sizer
vSAN Assessment
New vSAN Tools
• StorageHub.vmware.com
• Reference architectures, off-line demos and more
• Easy search function
• And More!
Storage Hub Technical Library Hands-On Lab
Test drive vSAN
for free today!
VMworld 2017 Content: Not fo
r publication or distri
bution
Nerd Out With These Key vSAN Activities at VMworld
#HitRefresh on your current data center and discover the possibilities!
Earn VMware digital badges to
showcase your skills
• New 2017 vSAN Specialist
Badge
• Education & Certification Lounge:
VM Village
• Certification Exam Center:
Jasmine EFG, Level 3
Become a
vSAN Specialist
Learn from self-paced and expert
led hands on labs
• vSAN Getting Started Workshop (Expert led)
• VxRail Getting Started (Self paced)
• Self-Paced lab available online 24x7
Practice with
Hands-on-Labs
Discover how to assess if your IT
is a good fit for HCI
• Four Seasons Willow Room/2nd floor
• Open from 11am – 5pm Sun, Mon, and Tue
• Learn more at Assessing & Sizing in STO1500BU
Visit SDDC
Assessment Lounge
VMworld 2017 Content: Not fo
r publication or distri
bution
HOURS
Sunday, August 27th 1:00pm – 5:00pm Monday, August 28th 11:00pm – 5:00pmTuesday, August 29th 11:00am – 5:00pm
SDDC
ASSESSMENT
LOUNGE
Four Seasons HotelDesert Willow Room
vSphere Optimization Assessments (VOA)
Hybrid Cloud Assessment
Virtual Network Assessment
vSAN Assessment
VMworld 2017 Content: Not fo
r publication or distri
bution
72
VMworld 2017 Content: Not fo
r publication or distri
bution
Backup
VMworld 2017 Content: Not fo
r publication or distri
bution
TEST TOOL
• Open source benchmark tool
• Emulate the disk or network I/O load
• Works for both single and clustered systems
• Can be used to measure:
– Performance of disk and network controllers
– Bandwidth and latency capabilities of buses
– Network throughput to attached drives
– Share bus performance
– System-level hard drive performance
– System-level network performance
VMworld 2017 Content: Not fo
r publication or distri
bution
TEST SCENARIO
• Iometer Workload Profile
Workload Profile I/O Size Read/Write Ratio Random/Sequential Ratio Outstanding I/O
All Read 4KB 100% Read 100% Random 16
Mixed Read/Write 4KB 70% Read, 30% Write 100% Random 4
Mixed Read/Write 32KB 70% Read, 30% Write 100% Random 2
Sequential Read 256KB 100% Read 100% Sequential 8
Sequential Write 256KB 100% Write 100% Sequential 8
vCPU RAM VMDK
4 4GB 10 * 9GB eager-zeroed-thick VMDK
• Iometer VM Configuration
• Performance Metrics
Metrics Unit
IOPS I/O per second
Latency Millisecond
Throughput Megabyte per second
VMworld 2017 Content: Not fo
r publication or distri
bution
HARDWARE & SOFTWARE RESOURCES
Hardware Components Details
VxRail E460F 2 Intel® Xeon® Processors E5-2698 v4 @ 2.20 GHz per node
384 GB (24 x 16 GB) or 512 GB (16 x 32 GB)
800 GB per disk group (1 or 2 disk groups)
5.235TB (3 x 1.92TB) or 20.94TB (6 x 3.84TB SSD) capacity per
node**
2 x 10 GbE SFP+ per node
Switch Fabric interconnect
Isilon X410 2 Intel® Xeon® Processors 2.0 GHz per node
128 GB RAM per node
3.2 TB SSD storage
64 TB HDD storage
2 x 10 GbE SFP+ per node
2 x 1 GbE per node
Hardware Configuration
VMworld 2017 Content: Not fo
r publication or distri
bution
HARDWARE & SOFTWARE RESOURCES
Software Components Details
Splunk Enterprise 6.5.0
Splunk Universal Forwarder 6.5.0
RedHat Linux 64-bit 6.7
VMware vSphere Enterprise 6.0 U2
VMware vCenter Server 6.0 U2
VMware Virtual SAN Enterprise 6.2
VMware vRealize Log Insight 3.3.1
VxRail Manager 4.0
OneFS 8.0.0.3
Software Configuration
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary and Call to Action
78
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary and Call to Action
• Consolidation and simplified management driving the need for deploying Big Data
workloads on vSAN
• All Flash vSAN Ready Nodes deliver cost efficient performance, reliability and
hassle free manageability
• Consider evaluating “All Flash” vSAN for Big Data workloads today!!
VMworld 2017 Content: Not fo
r publication or distri
bution
Q&A
80
VMworld 2017 Content: Not fo
r publication or distri
bution
Intel PlatformsTick-Tock Development Model
Intel® MicroarchitectureCodename Nehalem
Intel® MicroarchitectureCodename Sandy Bridge
Intel® MicroarchitectureCodename Haswell
Tock Tock TockTick Tick Tick
Nehalem
45nm
New Micro-architecture
Westmere
32nm
New ProcessTechnology
Sandy Bridge
32nm
New Micro-architecture
Ivy Bridge
22nm
New ProcessTechnology
Haswell
22nm
New Micro-architecture
Broadwell
14nm
New ProcessTechnology
Grantley Platform (Today)Romley PlatformThurley Platform
Wellsburg PCHPatsburg PCHTylersburg PCH
Xeon E5 v4 socket compatible with v3 series
VMworld 2017 Content: Not fo
r publication or distri
bution
82
BIOS SetupProfiles
– CPU Power and Performance Policy: Performance
– Workload Configuration: Balanced
– Memory RAS Configuration: Maximum Performance
– Fan Profile: Performance
Enabled– Hyper-Threading
– NUMA Optimized
– Enhanced Intel SpeedStep® Tech
– Intel® Turbo Boost Technology
– Uncore Frequency Scaling
– Performance P-Limit
Disabled– Cluster on Die
– Early Snoop
– CPU C States
– Energy Efficient TurboVMworld 2017 Content: Not fo
r publication or distri
bution
83
Test Setup (Linux OS)/etc/sysctl.conf
vm.swappiness=10
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.netdev_max_backlog = 250000
/etc/security/limits.conf
* soft nofile 65536
* hard nofile 1048576
* soft nproc 65536
* hard nproc unlimited
* hard memlock unlimited
CPU Profile
echo performance> /sys/devices/system/cpu/cpu{0..n}/cpufreq/scaling_governor
Huge Page
echo never> /sys/kernel/mm/transparent_hugepage/defrag
echo never> /sys/kernel/mm/transparent_hugepage/enabled
Network
ifconfig <eth> mtu 9000
ifconfig <eth> txqueuelen 1000
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Results – Replication HDFS v/s vSAN (Tech Preview)
84*Other names and brands may be claimed as the property of others.
HDFS replication with host affinity delivers optimal performance
vSAN 4DGs, HDFS replication with host affinity
VMworld 2017 Content: Not fo
r publication or distri
bution
85
VMworld 2017 Content: Not fo
r publication or distri
bution
Shared Nothing Architecture
Typical Applications
NoSQL:
MongoDB
Cassandra
Couchbase
Big Data:
Apache Hadoop
Apache Spark
Others
CONFIDENTIAL86
VMworld 2017 Content: Not fo
r publication or distri
bution
MongoDB Failure Testing
VMworld 2017 Content: Not fo
r publication or distri
bution
Failure Testing
From the perspective of MongoDB’s replica set setting, the test is divided into two parts:
• Enable MongoDB’s replica set which means there are three virtual machines in a MongoDB replica set, and thus we use ‘rs=3’ as the short term.
• Disable MongoDB’s replica set which means there is only one virtual machine in a MongoDB replica set, and thus we use ‘rs=1’ for the short term.
From the perspective of failure, we conducted two types of failure:
• A physical host failure which will power off all the running virtual machines residing on it. When a host fails, VMware vSphere High Availability will restart the impacted virtual machines on another host. This is the backend feasibility of setting ‘rs=1’ while keeping a low service downtime.
• A physical disk failure in a vSAN datastore which will cause a vSAN object to enter a degraded state. With the storage policy set with FTT=1, the object can still survive and serve I/O. Thus from the virtual machines’ perspective, there is no interruption of service.
CONFIDENTIAL88
Test Overview
VMworld 2017 Content: Not fo
r publication or distri
bution
Failure Testing
REPLICA SET
CONFIGURATION
FAILURE
TYPE
SERVICE
INTERRUPTION
TIME
RECOVERY METHOD
rs=1 Host Failure Around 120 seconds
vSphere HA restarted the failed
virtual machines.
rs=1 Disk Failure No interruption vSAN rebuilt the failed
components.
rs=3 Host Failure Around 10 seconds
1. MongoDB’s replica set failed
over from the primary node to the
secondary node.
2. vSphere HA restarted the failed
virtual machines.
rs=3 Disk Failure No interruption vSAN rebuilt the failed
components.
CONFIDENTIAL89
Failure Testing Result
VMworld 2017 Content: Not fo
r publication or distri
bution