22
White Paper EMC VMAX ALL FLASH WITH MONGODB MongoDB Performance Benchmarking with VMAX All Flash EMC Solutions Abstract This white paper provides performance benchmarking results when deploying a MongoDB environment in an EMC ® VMAX All Flash storage array. The paper details how MongoDB benefits from the advanced technical features of EMC VMAX All Flash systems. April 2016

EMC VMAX ALL FLASH WITH MONGODB, MongoDB · PDF fileExecutive Summary 4 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper Executive Summary

  • Upload
    vutram

  • View
    266

  • Download
    5

Embed Size (px)

Citation preview

White Paper

EMC VMAX ALL FLASH WITH MONGODB MongoDB Performance Benchmarking with VMAX All Flash

EMC Solutions

Abstract

This white paper provides performance benchmarking results when deploying a MongoDB environment in an EMC® VMAX All Flash storage array. The paper details how MongoDB benefits from the advanced technical features of EMC VMAX All Flash systems.

April 2016

Copyright

2 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Copyright © 2016 EMC Corporation. All rights reserved. Published in the USA.

Published April 2016

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC2, EMC, FAST, PowerPath, SnapVX, SRDF, Symmetrix, TimeFinder, Unisphere, VMAX, VMAX3 and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Part Number H15005

Contents

3 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Contents

Executive Summary ..................................................................................................................... 4

Technology Overview................................................................................................................... 6

Solution Overview ....................................................................................................................... 9

Testing environment .................................................................................................................. 10

Benefits of using VMAX and MongoDB ........................................................................................ 12

Conclusion ............................................................................................................................... 22

Executive Summary

4 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Executive Summary

As Big Data trends continue to evolve, customers must adopt next generation technologies for their operational databases. Databases and storage systems must handle larger datasets with different formats and sources generated by web, mobile, social, and cloud applications. These data trends are challenging the capabilities of traditional relational databases that were never designed to address massive data growth, performance, scale, and realtime data modeling. Customers are looking for newer and more agile ways to conduct realtime analytics and make critical business decisions.

MongoDB offers the best of traditional databases, as well as the flexibility, scale, and performance required by today’s applications. MongoDB is one of the fastest growing NoSQL databases, with more than 10 million downloads and more than 2000 customers, and is implemented by over one-third of Fortune 100 companies.

MongoDB is deployed in large enterprise environments and key vertical markets where performance, scale, and availability are a critical requirement. As traditional IT organizations adopt these newer technologies, they want to continue to use the trusted shared storage infrastructure that they have relied on for years with enterprise features such as high availability, advanced replication technologies, multi-tenancy, and security.

This solution shows how customers can use their existing storage area network (SAN) infrastructure with trusted EMC VMAX enterprise data services platform for MongoDB. VMAX is an EMC flagship Tier 1 storage platform with industry leading performance, scale and density, and is implemented by more than 94 percent of the Fortune 50 companies. Customers can solve end-to-end operational challenges associated with direct-attached storage (DAS) by consolidating MongoDB with their existing mission-critical applications on VMAX. They can then take advantage of simplified scale-out architecture, high resiliency, guaranteed service level agreements (SLAs), and compelling total cost of ownership (TCO) savings

With the traditional commodity/DAS architectures, resources are added in a completely linear fashion. Homogeneous servers are added with exact increments of CPU, memory and storage. Because applications do not consume resources linearly, resources can become stranded. For instance, adding nodes for pure storage capacity results in underutilized CPU or memory. By scaling storage independently, you can make better use of compute resources and potentially save on the datacenter footprint, hardware costs, and software licenses.

This document provides a MongoDB 3.2 performance benchmarking reference for implementation on EMC VMAX All Flash storage arrays.

This document is intended for use by pre-sales personnel, sales engineers, and customers who want to understand the benefits of implementing a MongoDB environment on an EMC VMAX All Flash storage array.

Business case

Solution overview

Recommendations

Document purpose

Audience

Executive Summary

5 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

EMC and the authors of this document welcome your feedback on the solution and the solution documentation. Contact [email protected] with your comments.

Authors: Harry Tu, Kecheng Bi, Praneetha Manthravadi, Kathleen McCarthy.

We value your feedback!

Technology Overview

6 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Technology Overview

EMC VMAX All Flash is an enterprise data services platform that is well suited to solve the CIO challenge of embracing a modernized flash-centric data center and hybrid cloud, while simultaneously simplifying, automating, and consolidating IT operations. VMAX is the industry’s leading Tier 1 highly resilient, scalable, and agile platform with a complete set of rich software data services

VMAX All Flash 450 arrays and 850 arrays use the latest 3D NAND Flash technology to consolidate high demand transaction processing workloads to deliver consistent <0.5ms response times. VMAX All Flash systems come in appliance-like packaging that is easy to configure, deploy, and manage. Figure 1 shows the arrays.

Figure 1. VMAX All Flash models and scaling

For enterprises that require petabyte-level scale, VMAX All Flash is purpose-built to manage high-demand, heavy-transaction workloads easily while storing petabytes of vital data. The VMAX All Flash hardware design features the turbo-charged Dynamic Virtual Matrix architecture that enables extreme speed and consistent sub-millisecond response time.

VMAX delivers millions of IOPS at massive scale using up to 384 cores. VMAX uses advanced multi-core/multi-threading algorithms and a flash-optimized design to meet strict SLAs for high-demand OLTP, virtualized applications, and high growth databases.

VMAX architecture is trusted for always-on availability with advanced fault isolation, robust data integrity checking, and proven non-disruptive hardware and software upgrades. Along with six-nines availability for 24x7 forever operations, VMAX uses SRDF® software, the gold standard for multi-site remote replication. Also, with EMC

VMAX All Flash

Technology Overview

7 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

TimeFinder® SnapVX™ technology, users can create hundreds of snapshots for each workload to optimize decision support, application testing, and business analytics.

The arrays are built for easy management, extreme performance, and massive scalability in a small footprint that provides compelling TCO compared to DAS architectures.

VMAX All Flash uses the industry’s first open storage and hypervisor converged operating system, HYPERMAX OS, which combines industry-leading high availability, I/O management, quality of service, data integrity validation, storage tiering, and data security with an open application platform. HYPERMAX OS features the first realtime, non-disruptive storage hypervisor that manages and protects embedded data services by extending VMAX high availability to services that traditionally run external to the array. It also provides direct access to hardware resources to maximize performance and can be upgraded without disruption.

MongoDB Enterprise edition is a document-oriented database, which is designed for a broad array of modern applications. It is used by organizations of all sizes to power mission-critical operational applications where low latency, high throughput, and continuous availability are critical requirements of the system. MongoDB incorporates the innovations of a NoSQL database—scalability, performance, and data model flexibility—while maintaining the foundation of strong consistency, secondary indexes, and a rich query language that developers expect from traditional, relational databases.

MongoDB is built for scalability, performance and high availability, scaling from single server deployments to large, complex, multi-site architectures.

Replica sets and sharding are two types of MongoDB clusters. MongoDB uses its native replication to maintain multiple copies of data across replica sets. A replica set is a group of MongoDB instances that maintain the same dataset. Replica sets help prevent downtime by detecting failures and automatically initiating failover, as shown in Figure 2.

HYPERMAX OS

MongoDB

Technology Overview

8 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Figure 2. MongoDB replication example

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.

MongoDB scales horizontally using sharding. Sharding splits data into ranges and uniformly distributes the shards across multiple computers, enabling even data distribution. Each shard is an independent database, and collectively, the shards make up a single, logical database as illustrated in Figure 3.

Figure 3. MongoDB sharding

Solution Overview

9 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Solution Overview

MongoDB can use local attached storage to hold its data sets—a simple solution to meet the requirements of a general system.

The testing that is described in this white paper uses the EMC VMAX All Flash array to support a MongoDB sharding environment that can meet enterprise demands for performance, scalability, and data replication. This testing demonstrates VMAX All Flash features with SnapVX, VMware vSphere® Virtual Volumes™, and Data at Rest Encryption (D@RE). We validate the impact on the MongoDB performance when we enable these features. This performance benchmarking uses MongoDB databases running Yahoo! Cloud Serving Benchmark (YCSB) random I/O workloads. YCSB is an open source, extensible workload generator that is commonly used to compare performance for a set of desired workloads.

Testing environment

10 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Testing environment

Figure 4 shows the hardware configuration that was used in the test environment.

Figure 4. Hardware configuration MongDB testing environment

Testing environment

11 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Table 1describes the server configuration of the test environment.

Table 1. Server configuration

Note: PowerPath was used in this case. While customers can choose to use either PowerPath or native multipathing, both work effectively with our underlying storage array.

Table 2 describes the software configuration of the test environment.

Table 2. Software configuration

Server configuration

Component Description

Server

Cisco Blade: UCSB-B200-M3

Memory: 512 GB/ each

CPU: Intel Xeon CPU E5-2670 0 @ 2.60 GHx

HBA: Cisco VIC FCoE HBA

Multipath EMC PowerPath®/VE 6.0 SP1 for VMware vSphere

Connectivity Cisco MDS 9706 (8Gb FC)

Array VMAX All Flash

Software configuration

Component Description

OS RHEL 7.2

ESXi 6.0 U1b

vCenter 6.0U1b

Multipath PowerPath/VE 6.0.1

MongoDB Enterprise 3.2

YCSB 0.6.0

Benefits of using VMAX and MongoDB

12 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Benefits of using VMAX and MongoDB

The VMAX All Flash storage array and MongoDB together offer an extraordinary choice for customers who want these benefits:

Simplified storage management

Mixed workload consolidation

Ease of creation and restart of MongoDB copies

Advanced data replication and high availability

Consistent performance (refer to Benchmarking MongoDB performance)

Ability to scale storage independently from compute resources to allow better use of resources, such as CPU and memory

VMware vSphere Virtual Volumes

Data at Rest Encryption (D@RE)

EMC Unisphere® for VMAX is an intuitive management interface that allows IT managers to maximize human productivity by dramatically reducing the time that is required to provision, manage, and monitor VMAX All Flash storage assets. Unisphere 360 software aggregates and monitors up to 200 VMAX All Flash arrays across a single data center.

These steps demonstrate how easy it is to create a LUN and assign it to a host through Unisphere.

1. Log in to Unisphere and select to create hosts and port groups, as shown in Figure 5.

Figure 5. Configuring hosts and port groups before provisioning storage

2. Run the Provision Storage wizard to provision storage to hosts, as shown in Figure 6.

Simplified storage management

Benefits of using VMAX and MongoDB

13 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Figure 6. Provisioning storage for a host

Figure 7 shows the Unisphere for VMAX dashboard that is used to monitor the storage system status.

Figure 7. Unisphere for VMAX performance dashboard

SnapVX delivers instant point-in-time replicas of host devices that can be used to create gold copies, to test patches, for backup and recovery, for data warehouse refreshes, or any other process that requires parallel access to, or preservation of, primary storage devices.

SnapVX creates snapshots by storing changed tracks (deltas) directly in the Storage Resource Pool (SRP) of the source device. With SnapVX, you do not need to specify a target device and source/target pairs when you create a snapshot, but you can create links from the snapshot to one or more target devices. If there are multiple snapshots and the application must find a particular point-in-time copy for host access, you can link and relink until the correct snapshot is located. In HYPERMAX OS arrays, SnapVX

Advanced data replication

Benefits of using VMAX and MongoDB

14 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

supports up to 256 snapshots per source device (including any emulation mode snapshots). See Figure 8.

Figure 8. SnapVX snapshot

MongoDB, by using the always-consistent snapshots that are available with SnapVX, allows for easy creation of restartable MongoDB copies.

For generating useful comparison metrics, we used three of the standard YCSB random I/O workloads (A, B, C), which are pre-defined within YCSB to simulate common I/O patterns in the NoSQL database environment.

Note: For more information about YCSB, refer to How to benchmark MongoDB with YCSB and How to run YCSB on MongoDB.

SnapVX provides low impact snapshots for VMAX LUNs. We qualified this by using a stand-alone MongoDB instance with YCSB profile Workload A. Figure 9 shows the results of running a workload on a MongoDB environment both with and without SnapVX.

Benchmarking MongoDB performance

Benefits of using VMAX and MongoDB

15 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Figure 9. MongoDB low impact snapshots for VMAX LUN

The testing environment is composed of 16 virtual servers built on VMware ESXi servers:

Three sharded nodes MongoDB cluster system with a replica set (three members) per each shard

One MongoDB configuration server

Five YCSB client servers to perform stress testing with Mongo instances on each YCSB client

Figure 10 shows the testing environment.

50000

60000

70000

80000

1

16 31 46 61 76 91

106

121

136

151

166

181

196

211

226

241

256

271

286

301

316

331

346

Thro

ugh

pu

t(o

ps/

sec)

1 Hour Data(10 seconds interval)

Workload A - Throughput Comparison

Base - Throughput SnapVX - Throughput

200.00

400.00

600.00

1

16 31 46 61 76 91

106

121

136

151

166

181

196

211

226

241

256

271

286

301

316

331

346

Late

ncy

(us)

1 Hour Data(10 seconds interval)

Workload A - Latency Comparison

Base - Avg Read Latency SnapVX - Avg Read Latency

Base - Avg Update Latency SnapVX - Avg Update Latency

Testing environment configuration

Benefits of using VMAX and MongoDB

16 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Figure 10. Test environment configuration

This test case simulates updates with a heavy workload that is a mix of 50 percent read operations and 50 percent write operations on a 1 TB dataset. Records were selected by using a random Zipfian distribution. An example of a real-world workload that mirrors this testing scenario is an application that tracks the activity of users at eCommerce sites and then personalizes digital advertisements based on their activity.

Figure 11 illustrates Workload A test case results. The test results show an average throughput rate of 65129 operations per second, with an average 0.71ms read latency and an average 0.74ms update latency.

Workload A test case: Update heavy workload

Benefits of using VMAX and MongoDB

17 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Figure 11. Workload A test results

This test case is read-intensive with minimal writes. It is set up as a mix of 95 percent read and 5 percent write operations that are based on a 1 TB dataset.

Figure 12 illustrates Workload B test case results. The test results show an average throughput of 87648 operations per second, with an average 0.76ms read latency and an average 0.80ms update latency.

Figure 12. Workload B test results

This test case simulates a full read-only workload with no-write I/O required on a 1 TB dataset. The entire dataset is accessed from three MongoDB shard nodes as a distribution pattern, because the dataset set is larger than memory; however, the underlying storage system was still receiving I/O requests.

600.00

700.00

800.00

900.00

1000.00

40000

45000

50000

55000

60000

65000

70000

120 39 58 77 96

115

134

153

172

191

210

229

248

267

286

305

324

343

Late

ncy

(us)

Thro

ugh

pu

t(o

ps/

sec)

1 Hour Data(10 seconds interval)

Sharding - Workload A

Throughput Avg Read Latency Avg Update Latency

600.00

700.00

800.00

900.00

1000.00

60000

65000

70000

75000

80000

85000

90000

1

20 39 58 77 96

115

134

153

172

191

210

229

248

267

286

305

324

343

Late

ncy

(us)

Thro

ugh

pu

t(o

ps/

sec)

1 Hour Data(10 seconds interval)

Sharding - Workload B

Throughput Avg Read Latency Avg Update Latency

Workload B test case: Read-mostly workload

Workload C test case: Read-only workload

Benefits of using VMAX and MongoDB

18 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Figure 13 illustrates Workload C test case results. The test results show an average throughput of 94434 operations per second, with a sustained average of 0.75ms read latency, which is relatively low.

Figure 13. Workload C test results

Virtual Volumes (VVOLs) is a key technology enabler that delivers a significantly new paradigm for how a virtualization administrator manages the underlying storage for virtual machines. This new paradigm is an important step in the VMware vision of a software-defined data center (SDDC) that delivers the quality of service expected from IT consumers. With vSphere Virtual Volumes, the management process moves from the LUN (data store) level to the virtual machine level. This level of granularity is critically important, as it is the core component of a virtualized environment.

While VMware VVOls simplify management and provide per-VM storage control, the revolutionary VMAX All Flash takes VVOL integration to a new level. The VMAX All Flash management paradigm, with radically simplified storage management, realizes the full value of VVOL storage policies. VMAX All Flash provides the highest levels of availability, data protection, and performance directly to the VM.

Planning storage for MongoDB deployment in a virtualization environment is not an easy task, especially when there are mixed types of concurrent workloads running on top of a data store. With VMAX All Flash, planning is no longer a problem. By using the vSphere Virtual Volumes Dashboard, a centralized location that is provided by Unisphere for VMAX to monitor and manage Virtual Volumes, a storage administrator can configure storage containers with different Service Level Objectives (SLOs) to meet MongoDB and virtualization administrators’ requirements, as shown in Figure 14.

600.00

700.00

800.00

900.00

1000.00

60000

70000

80000

90000

100000

1

21 41 61 81

101

121

141

161

181

201

221

241

261

281

301

321

341

Late

ncy

(us)

Thro

ugh

pu

t(o

ps/

sec)

1 Hour Data(10 seconds interval)

Sharding - Workload C

Throughput Avg Read Latency

VMware vSphere Virtual Volumes

Benefits of using VMAX and MongoDB

19 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Figure 14. Adding multiple storage resources with different SLOs to a storage container

As well as an easy-to-use dashboard, VMAX All Flash also delivers uncompromised performance with Virtual Volumes compared to traditional LUN provisioning. We qualified this performance with a stand-alone MongoDB deployment virtual machine based on vSphere Virtual Volumes and using YCSB Workload A. See Figure 15.

Figure 15. Workload A throughput and latency comparison

Benefits of using VMAX and MongoDB

20 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

D@RE provides built-in, hardware-based, on-array, back-end encryption for the VMAX family with no performance impact on the array. It protects information from unauthorized access when drives or arrays are removed from the customer data center. D@RE provides encryption on the back end by using SAS I/O modules that incorporate XTS-AES 256-bit, data-at-rest encryption. These modules encrypt and decrypt data as it is being written to or read from a drive. All configured drives are encrypted, including data drives, and spares. Also, all array data is encrypted, including Symmetrix® File System and Vault contents. Alternative encryption methods that are available today include costly third-party software that must be managed and can introduce performance degradation. Figure 16 illustrates the D@RE architecture.

Figure 16. D@RE architecture

By using the VMAX D@RE feature, MongoDB data is well protected, eliminating any unauthorized data access, in addition to protecting against threats related to physical removal of media. Based on our test that used a stand-alone MongoDB instance with YCSB profile Workload A, there was no impact on MongoDB performance after we enabled D@RE.

Figure 17 shows the results of running a workload on a MongoDB environment both with and without D@RE. We can see an average throughput rate of 65441 operations per second with D@RE, compared to 66686 operations per second without D@RE. Also, there is a sustained average of 0.48ms update latency.

Data at Rest Encryption

Benefits of using VMAX and MongoDB

21 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash

White Paper

Figure 17. D@RE test results

50000

60000

70000

80000

1

15 29 43 57 71 85 99

113

127

141

155

169

183

197

211

225

239

253

267

281

295

309

323

337

351

Thro

ugh

pu

t(o

ps/

sec)

1 Hour Data(10 seconds interval)

Workload A - Throughput Comparison

Base - Throughput D@RE - Throughput

200.00

300.00

400.00

500.00

600.00

11

52

94

35

77

18

59

911

312

714

115

516

918

319

721

122

523

925

326

728

129

530

932

333

735

1

Late

ncy

(us)

1 Hour Data(10 seconds interval)

Workload A - Latency Comparison

Base - Avg Read Latency D@RE - Avg Read Latency

Base - Avg Update Latency D@RE - Avg Update Latency

Conclusion

22 EMC VMAX All Flash with MongoDB MongoDB Performance Benchmarking with VMAX All Flash White Paper

Conclusion

Based on test report matrices, our tests demonstrate positive results with high throughput and low latency. When MongoDB is consolidated with a customer’s existing mission-critical applications on VMAX All Flash, customers can take advantage of these benefits:

Ease of management

Unisphere for VMAX provides a common user experience across storage platforms. It enables users to provision, manage, and monitor a VMAX All Flash environment easily. Unisphere provides a number of task-orientated dashboards to make monitoring and configuring a VMAX system intuitive and easy to use.

By using VMware Virtual Volumes, virtualization administrators can easily manage the underlying storage on VMAX for any virtual server used by the MongoDB system.

High performance by flash-optimized

VMAX offers flash drives as add-ons to traditional arrays, eliminating bottlenecks to deliver the highest performance and the lowest latency. In addition, EMC Fully Automated Storage Tiering (FAST®) technology and high-capacity NL-SAS drives down costs for storing inactive, less-critical data.

Efficient data replication

VMAX TimeFinder SnapVX allows the user to create snapshots without the need for a target volume. Snapshots can then be used to link to target volumes in either full-copy, or no-copy, mode which can then be presented to the host server.

By leveraging SnapVX, users can easily create copies of MongoDB production data for backups, decision support, data warehouse refreshes, or any other process that requires parallel access to production data.

Advanced data encryption

With D@RE, data is encrypted on all drive types without performance penalty. D@RE secures corporate data on hard drives in and out of the VMAX array providing protection against data theft which is a significant challenge faced by many enterprises today.

Summary