Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Dell™ PowerVault™ Best Practices Series
Networking Best Practices for the Dell™ PowerVault™ NX3500 NAS System as a File Share Solution A Dell Technical White Paper
Enterprise Storage Solutions Engineering Dell Product Group June 2012
i
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL
ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS
OR IMPLIED WARRANTIES OF ANY KIND.
© 2012 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without
the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Dell, the DELL logo, and the DELL badge, PowerConnect, EqualLogic, PowerEdge and PowerVault are
trademarks of Dell Inc. Broadcom is a registered trademark of Broadcom Corporation. Intel is a
registered trademark of Intel Corporation in the U.S. and other countries. Microsoft, Windows,
Windows Server, and Active Directory are either trademarks or registered trademarks of Microsoft
Corporation in the United States and/or other countries.
ii
Table of Contents
1 Introduction ....................................................................................................................................................... 2
1.1 Audience ..................................................................................................................................................... 2
1.2 The rest of this paper ................................................................................................................................ 3
2 Executive Summary .......................................................................................................................................... 4
3 Overview of the Dell PowerVault NX3500 Storage Solution .................................................................... 6
3.1 Features of the PowerVault NX3500 Storage Solution ...................................................................... 6
3.1.1 Expand Traditional File Share Limits .............................................................................................. 6
3.1.2 Advanced Data Protection Features .............................................................................................. 6
3.1.3 Replication Capability ....................................................................................................................... 6
3.1.4 Highly Available File Serving ............................................................................................................. 7
3.1.5 Write Optimization ............................................................................................................................. 7
3.1.6 Optimized Caching ............................................................................................................................ 7
3.1.7 Resource Optimization ..................................................................................................................... 7
3.2 Architecture of the PowerVault NX3500 Storage System ................................................................. 8
3.2.1 NAS Controller Pair ........................................................................................................................... 8
3.2.2 PowerVault MD Storage ................................................................................................................... 8
3.2.3 Backup Power Supply (BPS)............................................................................................................. 8
3.2.4 Logical view of the PowerVault NX3500 solution ....................................................................... 9
4 Network Overview .......................................................................................................................................... 10
4.1 SAN............................................................................................................................................................. 10
4.2 Internal Network ....................................................................................................................................... 11
4.2.1 Internal Network A (Peer Connection) ......................................................................................... 11
4.2.2 Internal Network B (SAN Switch) ................................................................................................... 11
4.3 LAN or Client Network ............................................................................................................................12
PowerVault NX3500 Switch Topology Options .................................................................................12
4.4 12
4.4.1 Dedicated SAN with High Availability ............................................................................................13
4.4.2 Dedicated SAN Without High Availability .....................................................................................13
4.4.3 All-in-One High Availability Option ............................................................................................. 14
4.4.4 All-in-One Without High Availability ............................................................................................ 14
5 Network Load Balancing ................................................................................................................................. 15
5.1 Network Load Balancing – Flat Network ............................................................................................ 16
iii
5.2 Network Load Balancing – Routed Network ...................................................................................... 17
6 Test Simulations .............................................................................................................................................. 18
6.1 ALB vs. LACP............................................................................................................................................. 18
6.2 NFS Mount Block Size 4K vs. 32K ......................................................................................................... 19
6.3 Client MTU of 1500 vs. 9000 ................................................................................................................ 20
7 Best Practices ....................................................................................................................................................21
7.1 Network Switch ........................................................................................................................................21
7.2 LAN ..............................................................................................................................................................21
7.3 SAN..............................................................................................................................................................21
7.4 Other Considerations ............................................................................................................................. 22
7.4.1 Optimizing Performance for a Single Client .............................................................................. 22
7.4.2 Considerations for IP address selection in an IP Hashing scheme ........................................ 24
Appendix A Test Description and Methodology.......................................................................................... 25
A.1 PowerVault MD Storage Array and NAS Reserve Configuration ..................................................... 25
A.2 IOzone Benchmark Tests....................................................................................................................... 25
A.2.1 Write ................................................................................................................................................... 25
A.2.2 Rewrite .............................................................................................................................................. 25
A.2.3 Read ................................................................................................................................................... 26
A.2.4 Re-Read ............................................................................................................................................ 26
A.2.5 Random Read ................................................................................................................................... 26
A.2.6 Random Write .................................................................................................................................. 26
A.3 Simulation Methodology ........................................................................................................................ 26
Appendix B................................................................................................................................................................ 27
B.1 Test System Components...................................................................................................................... 27
B.2 IP and Subnet information ..................................................................................................................... 29
B.3 Switch Configuration Samples .............................................................................................................. 30
Appendix C Test Hardware Configurations ...................................................................................................31
C.1 Network switches .....................................................................................................................................31
C.1.1 Switch Specifics ................................................................................................................................31
Appendix D Additional resources ................................................................................................................... 33
1
Acknowledgements
This whitepaper was produced by the PG Storage Data Management Engineering group of Dell Inc.
The team that created this whitepaper:
Nicholas Busick and Animesh Pillai
We would like to thank the following Dell team members for providing significant support during
development and review:
Raj Hosamani and Ron Stefani
2
1 Introduction
The Dell™ PowerVault™ NX3500 adds network attached storage (NAS) capabilities to PowerVault MD
Series storage arrays, allowing a single file system to use the entire capacity of an array and grow
space without interruption. The PowerVault NX3500 can be quickly and efficiently added to both
existing and new MD Series arrays to provide a high performance scale-up file and block storage
solution for customers looking for a cost-effective solution that can be easily configured (iSCSI, CIFS,
and NFS storage) from simple and user-friendly administrator interfaces.
While ease of implementation and use is important for any solution, how the solution performs is even
more critical. The performance of an NX3500 system depends upon the customer’s network and
storage infrastructure and a number of operation factors, including:
Disk type and RAID type
File system protocols in use: NFS, CIFS
Expected I/O load and the characteristics of the file data being stored
The goal of this paper is to provide:
Networking best practices for configuring the NX3500 NAS nodes
Networking best practices for configuring the MD storage in an NX3500 environment
Best practices for configuring the switch in an NX3500 solution
NOTE: A separate whitepaper dealing with performance in relation to disk, RAID levels, and file
protocols can be found on delltechcenter.com.
To obtain the results presented in this paper, Dell Enterprise Storage Solutions Engineering conducted
a series of tests to measure how the system performed when using various configurations, such as
MTU 1500 vs. jumbo frames, adaptive load balancing vs. LACP, and 4K NFS mounts vs. 32K NFS
mounts.
Providing a file serving solution for end users was the use case that was focused on for the tests.
Examples of end user file services include user home shares and departmental file shares. End-user file
shares accessed using CIFS and NFS protocols are the two predominant types. The best practice
guidelines presented in this paper are based, in part, on the results of the testing performed.
1.1 Audience
This paper is intended for solution architects, storage network engineers, system administrators, and IT
managers who need to understand how to design and deploy the PowerVault NX3500 NAS solution.
The reader should have a working knowledge of PowerVault™ MD Series SANs, basic networking
knowledge and related iSCSI SAN operations.
3
1.2 The rest of this paper
The rest of this paper contains the following sections:
Section 2 – Executive Summary
Section 3 - Overview of the Dell PowerVault NX3500 Storage Solution
Section 4 - Network Overview
Section 5 - Network Load Balancing
Section 6 - Test Simulations
Section 7 - Best Practices
4
2 Executive Summary
This section provides a brief overview of some of the findings that resulted from tests conducted in
Dell Storage Engineering labs. For more details about the tests themselves and the results, see the
sections referenced in this document.
There is no major performance difference ( less than 3 percent) between ALB and LACP using
different I/O patterns as tested with IOzone. In general, Dell recommends whatever method is
already in use at the customer site (see section 6.1, ALB vs. LACP).
For NFS mounts, a 32K mount block size provides 20 percent better performance on
sequential write operations than a 4K mount block size. For sequential reads and random
writes, performance is nearly equal between 32K and 4K mount block sizes (see section 6.2,
NFS Mount Block Size 4K vs. 32K).
There is no major performance difference ( less than 3 percent) between client side MTU of
1500 and MTU 9000 in an end user file share scenario. Therefore, Dell recommends using
MTU of 1500 for simplicity on the LAN, as 1500 is the typical default for most switches.
However, SAN port MTU sizes should be set to 9252 or the largest that the switch supports on
all SAN ports (see section 6.3, Client MTU of 1500 vs. 9000).
5
Client MTU of 1500 vs. 9000).
The PowerVault NX3500 supports four switch topologies, varying in levels of redundancy.
Dell’s recommended best practice is to use separate and redundant network switches for the
LAN and iSCSI SAN networks to provide physical separation of different types of network
traffic. Where this is not possible to have separate physical switches, VLANs can be used to
separate traffic, but the switch must be robust enough to support the amount of traffic that
will be flowing through it (see section Error! Reference source not found., Error! Reference
source not found.).
On the LAN and SAN, enable Flow Control and Spanning-Tree (edge-port or rapid spanning
tree) protocols on all ports that the NX3500 is connected to (see section Error! Reference
source not found., Error! Reference source not found.).
The distributed architecture of the FluidFS cluster is designed to deliver higher performance as
more clients are added. In environments using less than eight clients it is possible to fully
utilize all storage resources by configuring the clients with multiple network ports and setting
up multiple virtual IP addresses as described in section 7.4.1, Optimizing Performance for a
Single Client.
6
3 Overview of the Dell PowerVault NX3500 Storage
Solution
The PowerVault NX3500 is an enterprise-class distributed file system-based solution that gives
customers tools to manage file data easily and efficiently. It removes the scaling limitations associated
with traditional NAS systems. It makes administration easier by supporting scale-up capacity expansion
within a single namespace. This means that it integrates seamlessly into a Windows or CIFS share
environment and optimizes performance and scalability. This easy-to-use solution is ideal for
customers who need to manage an ever-increasing amount of unstructured (file) data. The
PowerVault NX3500 can provide more flexibility than traditional unified storage because it has no
architectural limits to file-system size. It can scale a single share to the full capacity of a PowerVault
MD SAN deployment.
The PowerVault NX3500’s scale-up architecture delivers a flexible, load-balanced pool of high-
performance storage, making it easy to grow capacity based on business needs. With dual active-
active file controllers and backup power supply, the PowerVault NX3500 provides excellent data
protection and performance with no single point of failure.
3.1 Features of the PowerVault NX3500 Storage Solution The PowerVault NX3500 provides advanced and powerful features that allow it to seamlessly integrate
into a variety of customer environments. This section discusses some of these features and explains
how they can be useful to end users and IT administrators.
3.1.1 Expand Traditional File Share Limits
The PowerVault NX3500 incorporates the Dell FluidFS, which is designed to optimize file access
performance and hardware utilization, as well as eliminate capacity constraints. A key component of
Dell’s Fluid Data architecture, FluidFS is a high-performance scale-out file system that presents a single
file-system namespace through a virtual IP address. Most NAS solutions have strict limits on the size of
a file share, but Dell FluidFS has no such limitations. With the PowerVault NX3500, customers can
expand a single file system up to the maximum capacity of a PowerVault MD storage array.
3.1.2 Advanced Data Protection Features
The PowerVault NX3500 includes file-level snapshot capability, which is separate from the PowerVault
MD storage array-based snapshot feature. End users can restore previous file versions from a directory
file snapshot without having to contact system administrators, thus freeing up administrators for other
tasks. The PowerVault NX3500 is easily integrated into a customer’s backup environment with support
for NDMP backup.
3.1.3 Replication Capability
NAS replication uses snapshot technology in the PowerVault NX3500 file system to replicate volumes
on the same or another PowerVault NX3500 system. After the initial full replication, changes to the
stored data are recorded incrementally to the target volume.
Replication is bi-directional, meaning that one system can hold target volumes for the other systems,
as well as the source volumes that are replicated to the other system. Replication policies can be set
up to run on various schedules or on demand.
7
3.1.4 Highly Available File Serving
A dual active-active controller architecture and large onboard battery-protected cache give the
PowerVault NX3500 outstanding performance. Each PowerVault NX3500 dual controller system
includes 24 gigabytes (GB) of memory for hosting the system cache, and the dual active-active
controllers automatically balance the load across both controllers.
3.1.5 Write Optimization
The PowerVault NX3500 aggregates small files and then stripes the data across the available back-end
storage for more efficient write operations (referred to as write coalescing). This process takes a
random access pattern and converts it into a sequential disk operation, which yields much higher
throughput. Additionally, all client writes are acknowledged after being written to the cache of the
local controller and mirrored to the cache of the paired controller. This approach avoids the latency
associated with disk access. The data is later asynchronously destaged to disk.
3.1.6 Optimized Caching
The PowerVault NX3500 NAS controller cache is organized as a pool of 4-kilobyte (K) pages and is
used for data and metadata. Data is evicted from cache based on the Least Recently Used (LRU)
algorithm. It maintains separate LRUs for data and metadata, which ensures that metadata is retained
longer in cache. In this way, Dell FluidFS can deliver high metadata performance, a major bottleneck in
traditional NAS systems. In addition, the PowerVault NX3500 adapts to read-intensive, write-intensive
and mixed workloads by maintaining separate LRUs for read and write, and by auto-tuning the size of
the shared read/write cache. Each NAS controller and its associated components read and cache the
data, which is accessed by the clients connected to it. All subsequent access to the same data is
serviced from cache, reducing back-end disk operations and improving response time.
3.1.7 Resource Optimization
The PowerVault NX3500 actively uses all NAS controllers for I/O and has no passive controllers or idle
resources. Because all controllers in a PowerVault NX3500 clustered system support active I/O,
organizations benefit from high intrinsic performance without the need to manually distribute
application load across multiple storage controllers. Load balancing sends client requests
automatically to the controller with the least-current workload.
8
3.2 Architecture of the PowerVault NX3500 Storage System
3.2.1 NAS Controller Pair
The PowerVault NX3500 clustered NAS solution consists of two NAS controllers configured as a pair.
This redundant configuration ensures that there is no single point of failure. The controllers handle
load balancing of client connections, manage read-write operations, perform caching, and interface
with servers and workstations. The cluster and its internal networks are consolidated using a virtual IP.
The PowerVault NX3500 software is installed on both controllers. The software is a complete package,
consisting of an operating system, volume management, distributed file system, and clustering
technology.
Read-write operations are handled through mirrored cache. Mirroring the cache data between the
paired NAS controllers ensures a quick response to clients' requests, while maintaining complete data
integrity. Data from the cache to permanent storage is transferred asynchronously through optimized
data-placement schemes.
Each controller is equipped with a 12 GB RAM, most of which is used for caching. The file system uses
the cache efficiently to provide fast and reliable writes and reads. Writing or modifying files occurs first
in the cache. Data is then mirrored to the peer controller’s cache. This feature ensures that all
transactions are duplicated and secured.
3.2.2 PowerVault MD Storage
The controllers connect to the PowerVault MD iSCSI storage array, which is a RAID subsystem. RAID
storage subsystems are designed to eliminate single points of failure. Each active component in the
storage subsystem is redundant and hot-swappable. The solution supports typical RAID configurations
including RAID 0, RAID 1, RAID 5, RAID 6, and RAID 10.
3.2.3 Backup Power Supply (BPS)
The BPS provides continuous power to the controllers. Each controller receives its power from a
dedicated BPS and from the power grid. The controllers regularly monitor the BPS battery status,
which requires the BPS to maintain a minimum level of power for normal operation. The BPS has
sufficient battery power to allow the controllers to safely shut down. The BPS enables the controllers
to use the cache as NVRAM. The BPS provides the clustered solution enough time to write all the data
from the cache to the disk if the controller experiences a loss of power.
9
3.2.4 Logical view of the PowerVault NX3500 solution
1. The physical disks within the back-end MD Storage Array are carved into different RAID groups
(RAID 0,5,10,6), based on availability and performance requirements.
2. Virtual disks (LUNs) are then created on these RAID Groups so I/O to these virtual disks is
striped across all the physical disks that compose the underlying RAID group.
3. These virtual disks are then mapped to the NX3500 controllers by creating a cluster host group
using the Storage Partitioning feature on the MD Storage Array.
4. Once mapped, the NX3500 controllers view these virtual disks from the MD Storage Array as
local disks and format them to form a single, large NAS file system reserve.
5. After the formatting is complete, NAS volumes can be created from the available formatted
space.
6. NFS exports or CIFS shares can then be added at the NAS volume level for user access using
NFS/CIFS protocols.
Figure 1 - Logical View of the NX3500 Solution
10
4 Network Overview
The PowerVault NX3500 solution consists of three main networks – SAN, LAN, and internal.
4.1 SAN
The NAS controller pair resides on the SAN network and communicates to the MD Storage Array on the SAN using the iSCSI protocol. The SAN is composed of two Gigabit Ethernet ports on each NAS controller that connects to the SAN network switches. The MD array has eight iSCSI ports and can be configured with four subnets (two for NAS and two for block) as shown in Figure 3, or two subnets (serves both NAS and block) as shown in Figure 4.
Figure 2 - MD 32x0i Controller with iSCSI and Management port
Figure 3 - MD Storage Array with Four Subnets ( Two for NAS and Two for Block)
Figure 4 - MD Storage Array with Two Subnets ( Serves both NAS and Block)
11
4.2 Internal Network
To achieve complete data distribution and maintain high availability, each NX3500 controller is
connected to its peer controller on an internal network. The internal network, which handles peer-to-
peer data transfer and management, is classified into A and B networks.
4.2.1 Internal Network A (Peer Connection)
Internal network A, which is composed of two independent Gigabit Ethernet ports, is the infrastructure
for NX3500 clustering and includes heartbeat monitoring, data transfer, and mirroring of information
between the controllers' caches. This internal network also distributes data evenly across all LUNs in
the system.
4.2.2 Internal Network B (SAN Switch)
Internal network B, which is the NX3500 internal management network, is plugged into the SAN
switch and connects both controllers. All administrative functions are performed on this network. If
the controllers lose communication with each other, but continue to function independently (known
as the split-brain situation), the PowerVault management network automatically resets the suspected
controller. This prevents the split-brain situation and ensures data integrity.
12
4.3 LAN or Client Network
After the initial configuration, a virtual IP (VIP) address connects the NX3500 to the client or LAN
network. The VIP address allows clients to access the NX3500 as a single entity, enables the NX3500
to perform load balancing between controllers, and ensures that the service continues, even if a
controller fails.
The LAN or client network is composed of two Gigabit Ethernet ports on each controller that connect
to the LAN or client network switches. The PowerVault NX3500 solution is administered using the LAN
or client network on the NAS Management VIP. The different NIC ports and associated cabling on the
NX3500 NAS controller are shown in Figure 5.
Figure 5 - PowerVault NX3500 NAS Controller Ports
NX3500 Controller-0 Controller-1
Client Connection 1 To Client Switch To Client Switch
Client Connection 2 To Client Switch To Client Switch
Peer Connection 0 Back to Back ( to 0 Controller-1) Back to Back ( to 0 on Controller-0)
Peer Connection 1 Back to Back ( to 1 Controller-1) Back to Back ( to 1 on Controller-0)
SAN Connection A To SAN Switch A To SAN Switch A
SAN Connection B To SAN Switch B To SAN Switch B
Internal Connection To SAN Switch A To SAN Switch B
4.4 PowerVault NX3500 Switch Topology Options
The NX3500 supports four switch topologies as shown in the following diagrams. The ideal topology for a particular environment needs to be chosen and the solution should be cabled accordingly. The four topology options are:
Dedicated SAN with High Availability (Recommended)
Dedicated SAN without High Availability
All-in-One with High Availability
All-in-One without High Availability
13
4.4.1 Dedicated SAN with High Availability
The best practice solution is to isolate the SAN traffic from LAN or client traffic with redundant switches for high availability (see Figure 6). All client cables are split between the redundant client switches; the SAN or internal network cables are split between the redundant SAN switches. Peer connections are always back to back.
Figure 6 - Switch Topology for Dedicated SAN with High Availability
4.4.2 Dedicated SAN Without High Availability
This topology (see Figure 7) leverages the best practices for iSCSI and separates the SAN and LAN/Client traffic; however, there is only a single switch each for LAN and SAN. Client cables are connected to a client switch, and the SAN cables are connected to a SAN switch as shown below.
Figure 7 - Switch Topology for Dedicated SAN Without High Availability
14
4.4.3 All-in-One High Availability Option
In this basic topology (see Figure 8) the LAN and the SAN are connected to the same switch. The switches are configured as a pair to provide redundancy as shown below.
Figure 8 - Switch Topology for All-in-One with High Availability
4.4.4 All-in-One Without High Availability
This configuration is like the previous option, except that a single switch is used for the LAN and SAN
in this case as shown in Figure 9.
Figure 9 - Switch Topology for All-in-One Without High Availability
15
5 Network Load Balancing
The NX3500 system’s client network NICs are used for client network access as was shown in Figure 5. Client network NIC load balancing is set on each NAS node using the Adaptive Load Balancing (ALB) or Link Aggregation Control Protocol (LACP) bonding mode. ALB, which is the default mode, requires no special switch settings and supports automatic, transparent network interface failover. Changing the bonding mode to LACP mode requires network switch configuration changes. In addition to bonding, more NAS service IP addresses can be added to the client network
configuration for proper load balancing across multiple client subnets. The number of NAS service IP
addresses that can be added ranges from one to four addresses, depending on the bonding mode for
the service four IP addresses for ALB, two for LACP.
The PowerVault NX3500 runs an ARPER daemon, which works at the MAC address layer and
continuously listens for client-arp requests to the NX3500 VIP and then distributes the client
connections to the different NX3500 controllers and NIC’s within each controller. The ARPER keeps
the routing of each client connection, so that each client connects to the same NAS controller it was
connected to previously. This client routing information is valid for 2 hours since the last client
connection. After 2 hours, the ARPER deletes the routing information. The ARPER daemon also
manages client connections based on status of hardware and resources and migrates client
connections in cases of failover. Figure 10 is a simple example of the ARPER and ALB/LACP in action.
Figure 10 - PowerVault NX3500 Network Load Balancing
16
5.1 Network Load Balancing – Flat Network
In a flat network all clients are on the same subnet as the NX3500. The ARPER listens for ARP requests
from the clients to the NX3500 VIP addresses and then calculates the load balancing algorithm to
assign the request to a specific controller and NIC within that controller. For any client request, the
ARPER checks the source IP address and destination IP address and acts as below:
If the request is from an unknown client or to a new VIP, then a new NIC mapping entry is
created and remains valid for 2 hours.
If the request is for an existing mapping, then the reply is the mapped NIC.
Figure 11 shows a flat network and how the ARPER distributes the load using MAC addresses.
Figure 11 - MAC address load balancing in local network
The first client (MAC-A) is connected to Controller-0 and NIC-0.
The second client (MAC-B) is connected to Controller -1 and NIC-0.
The third client (MAC-C) is connected to Controller-0 and NIC-1.
The fourth client (MAC-D) is connected to Controller-1 and NIC-1, and so on.
This method of load balancing on a flat network connects a client to the least-loaded network port on
the least-loaded controller.
17
5.2 Network Load Balancing – Routed Network
In an environment where all or some clients reside on a subnet different than the PowerVault NX3500,
additional steps need to be taken, because all IP packets coming from outside will contain the same
MAC address the one belonging to their router.
In such scenarios, instead of defining only one VIP for the entire NX3500 cluster, the cluster will have
one VIP for each of its NICs, making for four VIPs (in ALB bonding mode). These IPs are defined both
within the NX3500 cluster, as well as the DNS server. When client machines query the DNS for the
“NX3500” name for the first time, each query will result in a different IP address, corresponding in the
ARPER to the different nodes and NICs as shown in Figure 12.
Figure 12 - DNS Round Robin load balancing in Routed Network
The NX3500 also supports multiple subnets, so that VIPs can be configured for each subnet. This
enables the ARPER to see clients on different subnets as local clients and distribute by MAC address-
based load balancing, as with the case of a flat network described earlier.
18
6 Test Simulations
This section describes the tests conducted and presents an analysis of the results based on the fact
that higher IOPS and throughout is better. These results provide performance guidelines for deploying
a PowerVault NX3500 into a customer environment.
A series of NFS tests varied different parameters within the NX3500, the network switch, the test
clients, and the MD storage arrays. In each test, one parameter was variable, while the rest of the
parameters remained constant. In each case, the best result from the current test was used in
subsequent testing.
6.1 ALB vs. LACP
This test determined if there is a performance difference between configuring the NX3500 controllers
to use ALB bonding mode or LACP over a flat network. Everything was kept constant, except that one
test used ALB and the other used LACP.
Figure 13 - ALB Vs LACP
The results presented in Figure 13 showed the following:
Using the different I/O patterns as tested with IOzone, there is essentially no major
performance difference (less than 3 percent) between the ALB and LACP.
Selecting a bonding mode should depend on the customer’s existing network layout and best
practice guidelines prevalent in that environment. This choice also depends on the network
switch infrastructure at the target environment and the way LACP is implemented on the
particular switch in question. In some cases, it may be easier to just use ALB, because it
requires no switch configuration.
19
6.2 NFS Mount Block Size 4K vs. 32K
This test determined the effect NFS mount option block size has on performance. The mount options
tested were 4K and 32K. The clients used the following mount options for these tests:
4K mount option used on each test client
#mount -o rw,bg,hard,nointr,tcp,vers=3,timeo=2,retrans=10,rsize=4096,wsize=4096 NX3500:/export
/Mount_Point
32K mount option used on each client
#mount -o rw,bg,hard,nointr,tcp,vers=3,timeo=2,retrans=10,rsize=32768,wsize=32768
NX3500:/export /Mount_Point
Figure 14 - NFS Mount Option 4K Vs 32K
The results presented in Figure 14 showed the following:
On sequential writes, the 32K NFS mount option performed more than 20 percent better than
the 4K mount.
Other I/O patterns did not see any significant performance variation.
From these results, it can be concluded that the 32K mount option is recommended for better write
performance.
0%
20%
40%
60%
80%
100%
120%
140%
SequentialWrite
SequentialRe-write
SequentialRead
SequentialRe-read
RandomRead
RandomWrite
No
rmal
ize
d P
erc
en
tage
IOzone I/O Pattern
4K Vs 32K NFS Option
NFS 4K mount
NFS 32K mount
20
6.3 Client MTU of 1500 vs. 9000
This test determined the effect that the client side MTU size has on performance. The client side MTU
sizes tested were 1500 and 9000 (jumbo frames).
The below table shows the settings for the two tests across all infrastructure.
Infrastructure Components Test 1 (MTU) Test 2(MTU)
Test Client 1500 9000
ESX vSwitch 1500 9000
Physical Switch ports 1500 9252
NX3500 Client Network 1500 9000
Figure 15 - NX3500 client MTU 1500 Vs 9000
The results presented in Figure 15 show that there is no major performance difference ( less than 3
percent) between client side MTU of 1500 and MTU 9000 in an end-user file share scenario.
From these results, it can be concluded that, for simplicity in a file share environment, MTU 1500 is the
better choice, because almost all intermediate networking components support MTU 1500. With
jumbo frames (MTU 9000), you must ensure that all network components in the data path support
MTU 9000; otherwise, the packets are split into smaller sizes.
The actual MTU between the client and the NX3500 can be verified using the below command:
# tracepath NX3500
1: lnxmgmt (172.16.0.202) 0.124ms pmtu 9000
1: derby-mgmt (172.16.0.46) 1.372ms reached
1: derby-mgmt (172.16.0.46) 0.282ms reached
Resume: pmtu 9000 hops 1 back 64
21
7 Best Practices
In this section, best practices are provided for design and deployment scenarios based on the test
results presented in this paper.
7.1 Network Switch
Ideally, use separate and redundant network switches for the LAN and iSCSI SAN networks to
provide physical separation of different types of network traffic.
If it is not possible to have separate physical switches, VLANs can be used to separate traffic.
However, the switch must be robust enough to support the amount of traffic that will be
flowing through it.
NOTE: These tests used a pair of Dell Force10 S60 network switches without performance
degradation. The testing did not validate any other switches and thus cannot guarantee performance
or redundancy characteristics.
7.2 LAN
The NX3500 client network NIC ports should be connected in a way that any single
component failure in the LAN switches does not result in loss of access to the system.
Enable flow control on all ports to which the NX3500 is connected.
Enable spanning-tree edge-port or rapid spanning tree protocols on all LAN ports.
Use the 32K mount option with NFS mounts on all clients.
Use either ALB or LACP for the client network to deliver similar performance.
Set client-side MTU to 1500; it is easier to deploy and will not compromise performance in a
typical file share environment.
7.3 SAN
The NX3500 and MD Storage Array SAN network NIC ports should be connected in a way that
a single component failure in the LAN switches does not result in a loss of access to the back-
end storage.
Enable flow control on all ports connected to the NX3500 and MD Storage Array.
Enable spanning-tree edge-port or rapid spanning tree protocols on all SAN ports
For all SAN ports, set MTU sizes to 9252 or the largest that the switch supports.
22
7.4 Other Considerations
FluidFS is a distributed file system and has additional considerations when deploying and integrating
the NX3500 into an existing environment.
In the above tests, the I/O load was evenly distributed among all clients in a fair and distributed
manner. Each client was mapped to FluidFS and read and wrote its own data to the NX3500 file shares.
This is the most ideal condition in a distributed file environment, but real-life scenarios will most likely
not be as balanced, and I/O workloads will vary greatly. When designing or deploying a system, it is
important to distribute I/O workloads as much as possible. In addition to distributing workloads, use
multiple clients to take advantage of all I/O and network resources available on the NX3500. Using less
than eight clients provides good performance, but it will not fully utilize all storage resources due to
the scalable nature of FluidFS unless additional steps are taken as described below.
7.4.1 Optimizing Performance for a Single Client
As described earlier in this document, the FluidFS distributed file system is accessible across all
controllers and optimal performance is achieved when all controllers and all NICs are utilized by
clients. However, if a single client needs to leverage multiple controllers and NICs within the NX3500
cluster, the following steps should be taken:
The client should have at least four network ports.
If IP HASH is being used at any point along the data path, each network port should have a
properly selected IP address.
The NX3500 should be configured with multiple VIPs.
Static routes to NX3500 VIPs using dedicated client network ports should be defined on the
client and configured to execute at system startup as shown in Figure 16.
NX3500
US
B-B
PW
RA
LM
PS
U0
PS
U1
RS
232
USB-A
4746
4445
SFP
4243
4041
3839
3637
3435
3233
3031
2829
2627
2425
2223
2021
1819
1617
1415
1213
1011
89
67
45
23
01
STACK ID
ETH
ER
NE
T
CLA
SS
1 L
AS
ER
PR
OD
UC
T
Switch
Single Client
NIC-1/IP-1
NIC-2/IP-2
NIC-3/IP-3
NIC-4/IP-4 VIP-4
VIP-3
VIP-2
VIP-1
Static Route at the Client(CentOS 6.2 example):1. ip route add VIP-1 dev NIC-12. ip route add VIP-2 dev NIC-23. ip route add VIP-3 dev NIC-34. ip route add VIP-4 dev NIC-4
Figure 16 - NX3500 with multiple VIP’s and static routes on client
23
The NX3500 NFS export should be mounted using the individual VIPs on separate mount
points on the client.
I/O should be performed on the different mount points from the client. This ensures that
multiple network ports are being used and that multiple NX3500 FSDs are assigned to this
client.
Each client IP should be assigned to a different controller-NIC combination on the NX3500.
This can be verified using the MonitorLoad BalancingClient Connections link in the
PowerVault NX3500 NAS Manager as shown in Figure 17. Client IP addresses can be manually
reassigned using the Assign Interface link in the upper right corner.
Figure 17 - NX3500 NAS Manager showing Client Connections
The above approach uses a single port for a particular client-IP/FluidFS-VIP pair, but it can be
teamed to introduce redundancy.
An alternative approach is to use a bond with four NICs and then create multiple aliases on the
same bond to simulate different source IP addresses. In this case, you must choose source IP
and VIP addresses in a way that different physical ports within the bond are used for each
source/destination IP pair, as explained in the next section. This approach achieves
redundancy with less physical ports.
24
7.4.2 Considerations for IP address selection in an IP Hashing scheme
Most sites use link aggregation based on the IP HASH algorithm, which forms a logical group using
multiple physical ports. Internally, IP HASH uses the source IP address and the destination IP address of
the packet to determine which physical ports within the logical group to use for data transfer. If
different source and destination IP addresses are not chosen correctly in an environment, it causes a
single physical port to be used within a logical group and reduces network throughput. To prevent this
issue, it is essential to understand how the IP-based HASH works and how to identify the source and
destination IP address pair that will use all physical ports in a logical group.
Consider a link aggregation using four physical links. To determine which of the four links are used for
a particular source-destination IP pair, the following calculation is carried out:
1. Identifies the last octet of the source and destination IP address.
2. Converts the last octets to HEX.
3. Computes the XOR of these 2 HEX values.
4. Converts the result to decimal and computes a Modulo 4.
5. The resulting value between 0 -3 indicates which of the four links will be used for this
particular IP combination.
6. Given a choice, selects the IP addresses where all four links would be used.
The formula is as follows:
(((Last Octet(SrcIP) xor Last Octet(DestIP)) mod (# physical links))
25
Appendix A Test Description and Methodology
This section outlines the PowerVault MD storage array configuration, the data set properties, the
distribution of NFS file system operations, as well as the performance criteria and simulation
methodology used in the test studies. The test methodology used to derive the results of this paper
was composed of the following parts:
Configuring the PowerVault MD Storage Array to include virtual disks settings and controller
cache block size.
Creating NFS exports on the PowerVault NX3500.
Using IOzone from test clients to monitor and record system performance.
A.1 PowerVault MD Storage Array and NAS Reserve Configuration
For all tests, the MD Storage Arrays were configured with 96 drives, and the entire MD Storage Array
was dedicated to the NAS reserve of FluidFS. None of the array was used for block I/O access for other
applications. Each time, the allocated capacities of the FluidFS NAS volumes were configured to use
approximately 80 percent of the total capacity of the NAS reserve.
A.2 IOzone Benchmark Tests
The IOzone benchmark was used to test and compare the performance capacity of the NX3500 under
different network configuration scenarios. Throughput and IOPS were measured over different IO
patterns, including sequential writes, re-writes, sequential reads, re-reads, random reads, and random
writes. All system parameters, except for the ones under test, were kept constant across all the runs.
A.2.1 Write
This test measures the performance of writing a new file. When a new file is written, not only does the
data need to be stored, but also the overhead information for keeping track of where the data is
located on the storage media. This overhead, which is called the “metadata,” consists of the directory
information, the space allocation, and any other data associated with a file that is not part of the data
contained in the file. It is normal for the initial write performance to be lower than the performance of
re-writing a file due to this overhead information.
A.2.2 Rewrite
This test measures the performance of rewriting an existing file. When a file is written that already
exists, less work required because the metadata already exists. It is normal for rewrite performance to
be higher than the performance of writing a new file.
26
A.2.3 Read
This test measures the performance of reading an existing file.
A.2.4 Re-Read
This test measures the performance of reading a file that was recently read. It is normal for this
performance to be higher because the operating system generally maintains a cache of the data for
files that were recently read. This cache can be used to satisfy reads and improves the performance.
A.2.5 Random Read
This test measures the performance of reading a file with accesses being made to random locations
within the file. The performance of a system under this type of activity can be impacted by several
factors such as size of the operating system’s cache, number of disks, seek latencies, and others.
A.2.6 Random Write
This test measures the performance of writing a file with accesses being made to random locations
within the file. Again the performance of a system under this type of activity can be impacted by
several factors such as size of the operating system’s cache, number of disks, seek latencies, and
others.
A.3 Simulation Methodology
The IOzone benchmark was run from test clients mounted to an NX3500 NFS export using 4K block
sizes for IOPS measurement (Random Read and Random Write) and 32K block sizes for throughput
measurement ( Sequential Write, Re-Write, Read and Re-Read). Each client created a 2-GB file and
then performed the various IO tests on that file. IOPS and throughput results were captured for each
run and used to generate the comparison charts in the “Test Simulations” section.
27
Appendix B
Additional information about the configuration and versions of hardware and software used during
testing in this paper are documented in this section.
B.1 Test System Components
The tables below provide details for the major hardware and software system components used in the
test system configuration.
Hardware Components
Component Description
Test Client Servers 16 x Dell PowerEdge R710 Servers Quad-core Intel Xeon E5640 processor, 2.66Ghz, 12M Cache, 5.86 GT/s QPI, Turbo, 12 GB RAM
Network 2 x Dell Force10 S60 switches
Storage MD3220i – 96x300GB 10K RPM disks MD3200i – 96x1TB 7.2K RPM disks
Software Components
Component Description
Test Clients Host: VMware ESXi 5 Clients: CentOS 6.2 x64
PowerVault™ MD Storage Array Firmware
07.80.41.60
FluidFS version 1.0.400
Switch Firmware FTOS 8.3.3.4
28
21 x Force 10 S60 switches
Load Generating Servers
8 x PowerEdge R710 ServersOS: VMware ESXi 5
4VMs per ESX host 32 VMs per test suite
1 vCPU per VM1 vNIC per VM
2 GB of RAM per VM
2x NX3500 nodesw/ 220v BPS
2 x Battery Backup UPS
Management/Jump Boxes
Windows/Linux NAS Servers
Dell PowerVault MD Storage
MD3220i with 96x300GB 10K drives
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
43
44
45
46
47
48
EST
EST
Test Suite (A)
100V-240VAC, 4A
LOCATOR
RPS
DC IN 12V, 8.5A
100V-240VAC, 4A
LOCATOR
RPS
DC IN 12V, 8.5A
Figure 18 Physical layout of test environment
29
B.2 IP and Subnet information
The LAN and SAN network traffic was separated into different subnets to help prevent contention. The
below charts outline the IP and subnet schemes used during testing and were not changed unless
required to be changed as part of any specific MTU related test.
Table 1 Client Network IP Configuration
Entity IP Address MTU
VMware ESXi VM kernel Mgmt 172.16.0.1 – 172.16.0.9/16 1500
Client VMs 172.16.0.10 – 172.16.0.41 / 16 1500
Windows Active Directory 172.16.0.200/16 1500
VMware vCenter Server 172.16.0.201/16 1500
Linux Mgmt Host 172.16.0.202/16 1500
Windows Mgmt Host 172.16.0.204/16 1500
Table 2 NX3500 Network IP Configuration
Entity IP Address MTU
NAS Management VIP 172.16.0.46/16 1500
Client Access VIP 172.16.0.47/16 1500
NAS Controller 0 IP 172.16.0.48/16 1500
NAS Controller 1 IP 172.16.0.49/16 1500
NAS Internal Network “a” 100.100.64.1 – 100.100.64.4/24 9000
NAS Internal Network “b” 100.100.65.1 – 100.100.65.4/24 9000
SANa IP Controller0 192.167.10.20/24 9000
SANb IP Controller0 192.168.11.20/24 9000
SANa IP Controller1 192.168.10.21/24 9000
SANb IP Controller1 192.168.11.21/24 9000
Table 3 PowerVault™ MD Storage Array Network IP Configuration
Entity IP Address MTU
Controller0 Mgmt 172.16.0.205/16 1500
Controller1 Mgmt 172.16.0.206/16 1500
Controller0, Port0 192.168.10.100/24 9000
Controller0, Port1 192.168.11.100/24 9000
Controller0, Port0 192.168.10.101/24 9000
Controller0, Port1 192.168.11.101/24 9000
30
B.3 Switch Configuration Samples The following section shows excerpts from the Dell Force10 S60 running-config. These are provided
for reference to how the switches were configured for the tests.
Client Interface example interface GigabitEthernet 0/16 no ip address switchport flowcontrol rx on spanning-tree rstp edge-port no shutdown ! iSCSI Interface example interface GigabitEthernet 0/0 no ip address mtu 9252 switchport flowcontrol rx on spanning-tree rstp edge-port no shutdown ! Port-Channel example interface Port-channel 1 description "esx01a" no ip address switchport spanning-tree rstp edge-port channel-member GigabitEthernet 0/24-25 channel-member GigabitEthernet 1/24-25 no shutdown ! NX3500 Client Interface with LACP Example interface GigabitEthernet 0/18 no ip address mtu 9252 flowcontrol rx on port-channel-protocol LACP port-channel 10 mode active no shutdown ! VLAN example interface Vlan 100 description "Performance Client Network" no ip address untagged GigabitEthernet 0/16-17,20-23,40-47 untagged GigabitEthernet 1/16-17,20-23,40-47 untagged Port-channel 1-8,10-11 no shutdown
31
Appendix C Test Hardware Configurations
This section outlines the major hardware components used for the tests. One suite (Suite A) was
primarily used for testing, and it was configured with a field-deployment strategy in mind. This test
suite was configured with a focus on performance-oriented customers and hardware configurations
thought to be commonly deployed.
The tests did not cover all possible hardware configurations and permutations, but a number of
configurations were tested in certain combinations of the following variables:
ALB vs. LACP load balancing in a flat network
NFS block sizes of 4k and 32K
Client MTU sizes of 1500 and 9000 (jumbo frame)
The MD Storage Array disk configuration remained the same for all test configurations.
The client machines that generated NFS file I/O loads were configured as virtual machines (VM) using
CentOS 6.2 with VMware ESXi 5 as the hypervisor. Eight Dell PowerEdge™
R710 servers ran VMware
ESXi 5 for hosting the client VMs, with four VMs on each physical ESX host. Configuration details of the
test client host and guest machines are provided in the Appendix of this whitepaper.
C.1 Network switches
Initial testing was performed to determine which switch configuration yielded consistent results. It was
important that the switches were non-blocking and not a bottleneck; therefore, the “All-in-One High-
Availability” configuration (shown in Figure 19) could be used for testing.
The test configuration included two separate VLANs for the SAN and LAN on a common set of stacked
switches. The NX3500 controllers connected to the MD Series Storage Arrays via ports dedicated for
SAN connectivity. Intelligent Platform Management Interface (IPMI) NICs were also connected to the
dedicated SAN ports for management communications. The NX3500 controllers were interconnected
directly to support private internal network communications.
C.1.1 Switch Specifics
Two Dell Force10™ S60 switches were configured for the testing suite and interconnected via
the switch stacking module.
Each switch was divided into two VLANs (VLAN 100 and VLAN 200). VLAN 100 was dedicated
for client traffic and VLAN 200 was dedicated for private and iSCSI traffic.
The client network NICs of the NX3500 controllers were connected in a distributed fashion to
these switches. The NICs from the PowerEdge R710 servers hosting test client VMs were also
connected across these switches.
The LAN ports were configured as layer 2 on VLAN 100.
32
Flow control and jumbo frames (9252 MTU) were enabled on the switch ports belonging to
the SAN ports.
The SAN and IPMI NICs of the NX3500 controllers were connected across the switch stack in a
distributed fashion. The NICs from the MD Series Storage Array controller were also distributed
across the switch stack.
The SAN was configured as a single layer 2 network using the VLAN 200. For proper NX3500
system operation, the SAN and IPMI NICs of the NX3500 controllers must be connected to
ports on the same layer 2 VLAN.
Flow control was enabled on all switch ports.
Rapid Spanning Tree Protocol (RSTP) was enabled on switch ports connecting to NX3500
controllers and MD Series Storage Array controllers.
Additional configuration settings, IP, and subnet information for the MD Series Storage Arrays
and NX3500 networks are provided in the Appendix of this whitepaper.
Figure 19 - All-in-One High-Availability Option
33
Appendix D Additional resources
http://support.dell.com is focused on meeting customer needs with proven services and support.
http://DellTechCenter.com is an IT Community where IT professionals can connect with Dell
customers and Dell employees for the purpose of sharing knowledge, best practices, and information
about Dell products and installations.
Referenced or recommended Dell publications:
Dell Fluid File System White Paper:
http://i.dell.com/sites/content/shared-content/data-
sheets/en/Documents/DSFS_White_paper_r4-0.pdf
Dell PowerVault NX3500 Administrator’s Guide:
http://support.dell.com/support/edocs/STOR-SYS/nx3500/en/AG/PDF/AG_en.pdf
Dell PowerVault NX3500 Technical Guide:
http://i.dell.com/sites/content/business/solutions/engineering-
docs/en/Documents/PV_NX3500-Technical-Guide_v1.pdf
Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices:
http://i.dell.com/sites/content/shared-content/data-
sheets/en/Documents/Dell_PowerVault™_MD_Series_Storage_Arrays_IP_SAN_Best_Practice
s.pdf
For Dell PowerVault best practices white papers, reference architectures, and sizing guidelines for
enterprise applications and SANs, refer to PowerVault Publications at:
http://www.dell.com/pvresources
For Dell Force10 documentation and manuals, refer to Force10 Publications at:
https://www.force10networks.com/CSPortal20/KnowledgeBase/Documentation.aspx
https://www.force10networks.com/CSPortal20/TechTips/TechTips.aspx