Aggregate and RAID Group Sizing Guide: A Guide to ... · NetApp Confidential – For Distribution Under NDA only - 1 - Aggregate and RAID Group Sizing Guide: A Guide to Configuring

NetApp Confidential – For Distribution Under NDA only - 1 -

Aggregate and RAID Group Sizing Guide: A Guide to Configuring Aggregates Based on Performance Expectations

Network Appliance, Inc. Fall 2005 (last updated 5/22/06) Abstract: The goal of this document is to assist system engineers and storage administrators in determining the appropriate aggregate size, RAID parameters, and disk type in order to meet customer performance and availability needs. The reader is assumed to be familiar with the material presented in the most recent platform performance report for Data ONTAP™ 7G.

Netw

ork Appliance, a pioneer and industry

leader in data storage technology, helps organizations understand and m

eet com

plex technical challenges with

advanced storage solutions and global data m

anagement strategies.

TECHNICAL REPORT


TABLE OF CONTENTS

1. INTRODUCTION ...........................................................................................3

2. AGGREGATE CONSIDERATIONS .................................................................3

2.1 Basic Aggregate Performance.................................................................................... 3

2.2 Maximum Aggregate Size .......................................................................................... 6

3. RAID CONSIDERATIONS................................................................................7

3.1 Data Availability ................................................................................................... 7

3.2 RAID4 and RAID-DP Reconstruction Rates ........................................................... 9

3.3 Degraded Mode Performance............................................................................ 13

4. STORAGE OPTIONS: ATA OR FIBRE CHANNEL?.....................................15

4.1 Drive Architecture and Performance Differences ................................................. 15

4.2 Monitoring Disk Performance ................................................................................. 16

5. OTHER IMPORTANT SYSTEM LIMITATIONS TO REMEMBER..................18

5.1 FC-ALs....................................................................................................................... 18

5.2 Network Interfaces.................................................................................................... 19

6. PROCESS FOR PROPER AGGREGATE SIZING.........................................19

7. CONCLUSIONS ..........................................................................................24


1. INTRODUCTION Properly configuring aggregates and selecting the appropriate disk drives are two of the most important steps in ensuring a reliable and high-performing NetApp storage system running Data ONTAP 7G or later. With traditional volumes many factors have to be considered when creating any volume. Namely, it is necessary to answer the following questions before each implementation:

1) How much capacity does each application need? 2) How many and what type of disks are needed to meet each application’s I/O performance

need? 3) What is the minimum level of data availability that is appropriate for the application? 4) What is the minimal storage configuration necessary to meet all of the above criteria?

With the storage virtualization capabilities provided by flexible volumes and aggregates in Data ONTAP 7G, much of the same sizing processes used for traditional volumes must be followed , although each sizing consideration applies more to the configuration of the physical aggregate of disks, rather than individual flexible volumes. This technical report will provide valuable information and a step-by-step process that NetApp storage administrators can use to determine the most appropriate aggregate size, RAID group size, RAID type, flexible volume layout, and disk architecture to meet their storage performance and availability needs.

2. AGGREGATE CONSIDERATIONS Aggregates are the physical containers for flexible volumes: all data residing in each flexible volume is striped across all the physical disks of the host aggregate. Consequently, the number of disks in an aggregate can dramatically affect the total performance achievable by any flexible volume that it hosts. In general, the larger the aggregate, the higher throughput one can expect from any flexible volume hosted on it. However, besides the number of disks in any particular aggregate, other factors such as the host application’s I/O characteristics and the number of flexible volumes to be hosted should be considered when planning a NetApp storage deployment using Data ONTAP 7G. In this section, these factors and the relationships between them will be explored in detail. 2.1 Basic Aggregate Performance Configuring an aggregate with the appropriate number of disks is critical to achieving an application’s storage performance needs. This section will examine how flexible volume performance varies with the number of data disks in the underlying aggregate for simple


workloads. Throughput performance metrics (measured in either MB/sec or IOPS) for two general classes of workloads are presented: large sequential I/O and small random I/O. These are the same broad classifications of workloads examined in the platform performance reports available from NetApp’s PartnerCenter website. All graphs display performance metrics using NFS V3, with TCP/IP as the transport mechanism, but in general the performance curves displayed are similar to those of all currently supported file-sharing protocols. Only the maximum attainable filer throughput changes from protocol to protocol. This change is usually within 10% of the number stated for NFS V3 over TCP/IP. For a more detailed comparison of platform protocol–specific maximum throughputs, please refer to the appropriate platform performance report. Large Block Size, Sequential Workloads Figure 1 shows the maximum obtainable, uncached, multiclient, sequential read throughput from a flexible volume as a function of the number of disks in the base aggregate.

0

50

100

150

200

250

300

0 3 6 9 12 15 18 21 24 27 30

Total Number of Disks in the Aggregate

Thro

ughp

ut (M

B/s)

FAS980, 15k RPM FC FAS920, 15k RPM FCFAS270, 10k RPM FC R200, 5400 RPM SATAFAS3020, 7200 RPM SATA

Figure 1) Large block-size (32KB), sequential read throughput.

It is worth noting that after around twenty or so drives, adding more disks to the base aggregate will not increase sequential read throughput. At this point, CPU resources become the limiting factor for filer performance instead of the number of drives in the aggregate hosting the flexible volume. Figure 2 shows the maximum multiclient, sequential write throughput as a function of the number of data disks in the base aggregate.


0

50

100

150

200

0 3 6 9 12 15 18 21 24 27 30

Total Number of Disks in the Aggregate

Thro

ughp

ut (M

B/s)


Figure 2) Large block size (32KB), sequential write throughput. Similar to the large block size, sequential read workload throughputs, substantial large block size, sequential write throughputs can be achieved by utilizing only a modest amount of disks. Small Block Size, Random Workloads Because of the way WAFL® uses NVRAM to delay and reorganize writes to disk, a small block size, random write workload typically appears as sequential write workload to the disk subsystem (provided there is adequate contiguous free space in the volume being written to). This allows Data ONTAP to service small block size, random write workloads extremely well. Figure 3 shows the maximum multiclient, small block size, random write throughput as a function of the number of data disks in the base aggregate.

010203040506070

0 3 6 9 12 15 18 21 24

Total Number of Disks in Aggregate

Thr

ough

put (

MB

/s)


Figure 3) Small block size (4KB), random write throughput.

Since many more metadata changes are associated with small block size, random read or write workloads, CPU resources usually become a limiting factor sooner than with


sequential workloads. However, because of the write optimization innate to WAFL and Data ONTAP, it is usually more important to focus on the number of random read IOPS required to service an application’s I/O requirements rather than the number of random write IOPS. Figure 4 shows the maximally obtainable multiclient, uncached, small block size (4kb), random read throughput as a function of the number of data disks in the base aggregate.

020406080

100120140

5 25 50 75 100 150 200Total Number of Data Disks in Platform

Thro

ughp

ut (M

B/s

)


Figure 4) Small block size (4KB), uncached, random read throughput.

Unlike all the other workloads, where the maximally obtainable throughput can usually be obtained with several fully populated DS14m2 shelves of disk, uncached, random read workloads are almost always limited by the number of data disks in the aggregate. It is extremely important to understand that the maximum random read throughput to be expected from any one flexible volume scales linearly with the number of data drives contained in its host aggregate. Additionally, random read throughput and average random read I/O response times depend on the physical characteristics of the drives in the aggregate. Different drive types and their associated performance differences are covered in detail in section 4. • Key point: create the largest aggregates possible. When possible, try to maximize

the number of disks in any aggregate, especially during the creation of the aggregate. This will maximize the random read throughput available to the aggregate and allow for all the storage provisioning benefits of flexible volumes to be realized.

2.2 Maximum Aggregate Size The maximum flexible and traditional aggregate sizes are identical for the same platforms. For example, a FAS960 running Data ONTAP 7.0 may have a maximum


individual aggregate capacity of 16TB or host a traditional volume (a traditional aggregate) with a maximum capacity of 16TB. These size limitations vary from platform to platform, so always refer to the hardware configuration guides available on NOW or the hardware specification sheets available on www.netapp.com when considering things such as platform capacity limitations. Table 1 also shows the recommended root volume sizes for several platforms. Flexible root volumes of the recommended size reserve adequate space for document installation, log files, and the ability to store several images of the filer’s system memory should this be needed for diagnostic purposes. While it is not recommended to store user data on the root volume, storing user data on other flexible volumes that share the same host aggregate with the root volume is encouraged.

Table 1) Recommended root volume size. Maximum Disks per Aggregate Platform

(single storage

controller)

System Memory

Recommended Flexible Root Volume Size

Maximum Aggregate

Size 72GB 10k RPM

144GB 10k RPM

300GB 10k RPM

72GB 15K RPM

250GB 7.2k RPM

Maximum Disks Per System

FAS980 8GB 30GB 16TB 222 112 56 222 64 336 FAS3050 4GB 20GB 16TB 222 112 56 222 64 336 FAS3020 2GB 15GB 16TB 111 56 28 111 32 168 FAS920 2GB 15GB 6TB 84 42 21 84 24 84 FAS270 1GB 10GB 6TB 56 42 21 56 24 56 R200 6GB 24GB 16TB n/a n/a n/a n/a 64 336

3. RAID CONSIDERATIONS When creating an aggregate in Data ONTAP 7G, there are several RAID parameters to choose from. Notably, these are RAID type (either RAID-DP™ or RAID4) and RAID group size. Changing either the RAID group size or RAID type for any particular aggregate may have an impact on application performance, data availability, or even the cost per usable gigabyte of storage. This section will examine why it is strongly recommended that RAID-DP and the default RAID group size always be used. 3.1 Data Availability How reliable any given RAID group in a system is depends on a myriad of factors, from disk characteristics such as disk drive usage, capacity, and architecture to environmental conditions such as ambient temperature, power conditions, and shelf connectivity. By strictly following the site requirement guides specified on NOW, many of the environmental factors that could reduce data reliability are addressed. While creating an accurate predictive model of when and how many disk drives will fail in any given system is beyond the scope of this document, there are some simple guidelines that a storage administrator can follow to maximize the availability of the data contained in flexible volumes:

• Always select RAID-DP for an aggregate’s RAID type • Use the default RAID-DP RAID group size • Let Data ONTAP choose which disk drives to use during aggregate

creation

http://www.netapp.com/


• Use ESH or ESH2 shelf controller technology with FC shelves • Use local synchronous mirroring and RAID-DP for the highest availability

storage configuration possible RAID-DP vs. RAID4 The availability of data for any given flexible volume is commonly expressed in terms of the mean time to data loss (MTTDL) for that volume. For illustrative purposes, the MTTDL model in TR3027 (www.netapp.com/tech_library/3027.html#section3.2) is used to compare the reliability of RAID4 and RAID-DP RAID groups as the RAID group varies in size. Figure 5 assumes that the rate of failure for any given disk drive population is constant, and the distribution of disk drive failures over time is exponential, and the MTBF characteristics stated in Table 3 are used. To account for three concurrent disk failures being necessary before a RAID-DP RAID group becomes inaccessible; the MTBF metric of the drive is cubed instead of squared. As a result, Figure 5 shows the reliability of RAID-DP RAID groups being three orders of magnitude greater than that of comparably sized RAID4 RAID groups.

RAID-DP vs RAID-4 Availability

10000

100000

1000000

10000000

100000000

1000000000

10000000000

100000000000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

RAID Group Size

MTT

DL

(exp

ress

ed in

yea

rs)

RAID-4 RAID-DP

Figure 5) RAID-DP vs., RAID4 availability.

Figure 5 shows MTTDL as a function of RAID group size. The size of your largest RAID group affects the availability of your data on that aggregate. The larger your RAID group is, the longer disk reconstruction takes, and consequently the time that the filer is vulnerable to a double disk failure is longer as well—translating into a lower MTTDL. In Figure 5, each RAID group is assumed to be fully populated, and for our MTTDL model, no other component failures (such as host bus adapters, network interfaces, or other pieces of hardware) are taken into account. The MTTDL model being used also assumes

http://www.netapp.com/tech_library/3027.html#section3.2


that a spare disk is always available and that the reconstruction of any RAID group can begin immediately after a disk failure. It should be noted that the MTTDL is a mathematical expression of reliability and does not infer that a single disk drive will be in service a billion years before failing. For more information on interpreting MTTDL and MTBF metrics, please see J. G. Elerath, “Specifying Reliability in the Disk Drive Industry: No More MTBF’s,” Proc. Annual Reliability & Maintainability Symp., January 2000. MTTDL metrics for RAID4 groups larger than 14 are not shown in Figure 5, as the maximum supported RAID4 group size in Data ONTAP 7G is 14. For reference, Table 2 shows the maximum RAID4 and RAID-DP RAID group sizes for Data ONTAP releases 6.4.2 and later.

Table 2) Maximum RAID group size by Data ONTAP release.

Release 6.4.5 (GA) 6.5.2 (GA) 7.0

Filer/Server Platform RAID4 RAID4 RAID-DP RAID4 RAID-DP FAS250 14 14 14 14 14 FAS270 n/a 14 28 14 28 F825 28 14 28 14 28 FAS920 28 14 28 14 28 FAS940 28 14 28 14 28 F880 28 14 28 14 28 FAS960 28 14 28 14* 28* FAS980 n/a 14 28 14 28 Release NearStore® Platform 6.4.5 (GA) 6.5.2 (GA) 7.0 R100 8 8 12 8 12 R150 6 6 12 6 12 R200 x 8 16 8 16 *It should be noted that the maximum RAID-4 and RAID-DP RAID group size for 7200 RPM ATA drives are eight and sixteen respectively. 3.2 RAID4 and RAID-DP Reconstruction Rates RAID reconstruction rate depends on three factors: the maximum rate at which the reconstructing disk can be written to, the number of FC loops servicing the RAID group in question, and the size of the RAID group. Unlike RAID4 or RAID5, RAID-DP can sustain two simultaneous disk failures within the same RAID group and still service I/O requests. RAID reconstruction times for double disk failures take roughly twice the amount of time it would take to reconstruct a single failed disk within the RAID-DP group. Figures 6 and 7 show RAID reconstruction rates as a function of RAID group size for different FC-AL configurations and figure 7A shows the reconstruction rate as a function


of RAID group size with 2Gb/s FCALs. Each RAID group is assumed to be fully populated, and the disks in the RAID group are spread evenly across all available FC-ALs. The RAID reconstruction times shown are for all storage appliance platforms with no other load on them.

Figure 7) RAID reconstruction rates as a function of RAID group size (2Gb/sec FC-ALs, 10K RPM drives) Each curve has a flat region and a curved region. The flat region is where the RAID reconstruction rate is limited by the speed at which a single disk can be written to (e.g. ~45MB/s for current generation Fibre Channel drives). The curved region shows RAID reconstruction rates that are limited by the FC loop bandwidth available to the RAID group in question.

RAID Reconstruction Rates as a function of RAID Group Size, 2Gb/s FCALs. 10k RPM FibreChannel Disk Drives

0

10

20

30

40

50

1 3 5 7 9 11 13 15 17 19 21 23 25 27

RAID Group Size

MB

/sec

1 FCAL

2 FCALs

3 FCALs

4 FCALs

5 FCALs

6 FCALs

05

101520253035404550

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

RAID Group Size

MB

/sec

1 FCAL2 FCALs3 FCALs4 FCALs5 FCALs6 FCALs

Figure 6) RAID reconstructions rates as a function of RAID group size, (1Gb/sec FC-AL, 10K RPM drives.


RAID Reconstruction Rates as Function of RAID Group Size, 2Gb/s FCALS, ATA drives

05

101520253035

3 5 7 9 11 13 15 17 19 21 23 25 27

RAID Group Size

MB

/Sec


Figure 7A) RAID reconstruction times with ATA drives. RAID reconstruction times are dependent on the rate of RAID reconstruction and the size of the disks in the RAID group. Figures 8, 9 and 10 emphasize the effect of RAID group size on reconstruct times for different disk capacities. Having any additional load on the RAID group that is being reconstructed will of course lengthen the time it takes to reconstruct it.

Time Required to Reconstruct RAID Goup based on RAID Group Size (Idle FAS960 w/ 300 GB, 10k RPM drives, 2Gb/s FCALs)

0

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

RAID Group Size

Hou

rs


Figure 8) RAID reconstruction times with 300GB drives.


Time Required to Reconstruct RAID Goup based on RAID Group Size (Idle FAS960 w / 144 GB, 10k RPM drives, 2Gb/s FCALs)

0

12

3

45

6

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

RAID Group Size

Hou

rs

1 FCAL

2 FCALs

3 FCALs

4 FCALs

5 FCALs

6 FCALs


Time Required to Reconstruct RAID Goup based on RAID Group Size (Idle FAS960 w/ 72 GB, 10k RPM drives, 2Gb/s FCALs)

0

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

RAID Group Size

Hou

rs



Key point: By using the default RAID group size for either RAID4 or RAID-DP, and ensuring that there are at least four active 2Gb/sec FC-ALs servicing any RAID group, one can achieve the lowest RAID reconstruction times possible for a given configuration.


F880 SFS w/ recon

F880 SFS baseline

FAS960 SFS w/ recon

FAS960 SFS baseline

FAS980 SFS w/ RAID-4 reconstruction

FAS980 SFS baseline

FAS980 SFS w/ RAID-DP reconstruction

0

1

2

3

4

5

6

7

8

9

0 5000 10000 15000 20000 25000 30000 35000 40000

sfs throughtput (ops/sec)

resp

onse

tim

e (m

sec)

3.3 Degraded Mode Performance Another critical factor to consider regarding RAID group sizes is aggregate performance during RAID reconstruction (otherwise known as degraded mode). Reads and writes to the RAID group in question are considerably slower during RAID reconstruction. This is primarily because RAID reconstruction I/Os are occurring in addition to user-requested reads and writes. Each user-requested read going to data that has not yet been reconstructed requires on-demand reconstruction involving all non-failed disks of the degraded RAID group. Since on-demand reconstruction happens in addition to regular reconstruction, these reads are slower. User-requested writes during RAID reconstruction are likewise affected because of the higher disk utilization. Figure 11 shows an example of impacted SPEC SFS benchmark performance due to RAID reconstruction.

Figure 11) SPEC SFS results with and without RAID reconstruction. Figure 11 shows SFS 3.0 throughput/response time curves for an F880, a FAS960, and a FAS980 during medium-throttle reconstruction, and their SFS baselines without reconstruction. To achieve these numbers, six RAID groups were used for the F880 and 10 RAID groups were used for the FAS960 and FAS980. Each RAID4 group used eight 72GB, 10K RPM disks, and each RAID-DP group used 16 72GB disks each.


In practice, it is rare to have RAID group reconstruction occurring on a totally idle system. Usually there is some load present. RAID reconstruction throttling (via the Data ONTAP raid.reconstruct.perf-impact option) is a way to regulate and guarantee the forward progress in RAID reconstruction during the presence of user workload. The reconstruction knob regulates two resources: the CPU bandwidth available for user workloads vs. computing reconstruction XOR logical operations and the amount of user I/Os vs. reconstruction I/Os. There are three knob settings: “low,” which limits reconstruction to CPU idle cycles; “medium,” which is the default setting and guarantees a minimum portion of CPU and disk bandwidth for reconstruction; and “high,” which schedules reconstruction ahead of user workloads. The default in Data ONTAP is set to medium. Figure 8.1.2 Raw RAID Reconstruction Rates

Figure 12) RAID reconstruction rate and user workloads. Figure 12 shows how medium-throttle reconstruction affects increasing SFS workloads. For comparison, the horizontal dotted lines show the reconstruction rates of an otherwise idle F880, FAS960, and FAS980. User I/O is always given precedence over I/O generated by RAID reconstruction, so consequently, as user I/O increases, the RAID reconstruction rate decreases. Since there is a significant impact on overall performance when a RAID group is in degraded mode, default RAID group sizes or smaller should be chosen so that reconstruction times are kept to a minimum.


4. STORAGE OPTIONS: ATA or Fibre Channel? 4.1 Drive Architecture and Performance Differences Network Appliance currently offers two storage options for storing primary data: Fibre Channel disk drives and Serial ATA disk drives. In addition to having different integrated circuitry supporting different bus interfaces, NetApp Fibre Channel disk drives and Serial ATA disk drives also have quite different physical characteristics. Table 3 provides a summary of some of the major differences.

Table 3) Disk drive characteristics. Marketing Part Number Disk

Capacities (GB)

Revolutions per Minute

Average Seek Time (seconds)

Internal Data Transfer Rate

MTBF (Hours)

X273A, X271A 72, 36 15,000 3.5 142MB/s 1,400,000 X274A, X272A 144, 72 10,000 4.7 118MB/s 1,400,000 X262A 250 7200 9 59MB/s 1,000,000 X266A 300 5400 10 46MB/s 1,000,000 As is evident from Figures 1 through 4, the total number of data disks plays a crucial role in achieving maximum performance from any aggregate. However, when sizing aggregates for performance, drive type is also extremely important to consider. Different Drive Capacities, Same RPM = Identical Drive Performance Within the same NetApp drive family, drive performance characteristics are remarkably similar—the average seek time and rotational speed and external transfer rates of a 72GB, 10K RPM disk are exactly the same as those of a 144GB, 10K RPM disk or a 36GB 10K RPM drive. A 72GB, 10K RPM disk is no faster or slower than a 144GB, 10K RPM disk; it just holds more data. It is extremely important to note though that different drive families have very different performance characteristics. The Application Workload Is Important to Consider When Deciding Drive Type Because of the tight Data ONTAP integration between RAID-DP, WAFL, and NVRAM, writes to disk are extremely efficient, and the average client write I/O response time is extremely low. This means that in general, SATA drives are an excellent choice for aggregates that service write-biased workloads. However, for applications with extremely low latency, uncached, read response time requirements, SATA drives are not a good choice. Figure 13 illustrates the differences between disk drive families and response times when servicing 4kb, uncached, random read workloads of increasing size.


FibreChannel vs. ATA Disk Drive Performance

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120 140 160 180 200 220 240 260

4k Random Read IOPS per Data Drive

Ave

rage

Res

pons

e Ti

me

(in m

illis

econ

ds)

ATA 5.4k RPM ATA 7.2k RPM 10k RPM 15k RPM

Figure 13) Fibre Channel vs. ATA drive performance. Figure 13 shows that disk drives become more efficient servicing I/O requests as the amount of I/O requests over time increases. However, the more IOPS the drive services, the higher the average response times will be. In order to meet a sub-10ms, random read response time requirement, 15K RPM drives must be used. Additionally, the total IOPS presented to each drive in the aggregate must be relatively low; otherwise, the random read response requirements may not be met. This is one of the primary benefits of configuring larger aggregates that host many flexible volumes—the average disk utilization of each disk in the aggregate becomes lower, allowing for lower response times and higher random read throughputs. For more information on migrating from traditional volumes to flexible volumes hosted on one or more large aggregates, please refer to TR3356 on NOW: www.netapp.com/tech_library/ftp/3356.pdf. 4.2 Monitoring Disk Performance Systematically monitoring aggregate disk performance is an extremely helpful tool for storage administrators managing flexible volumes. As Figure 13 clearly shows, average read response times increase as I/O load to the individual drives increases. Data ONTAP provides several mechanisms to trend disk performance over time or to just quickly evaluate how busy the filer’s disks are.

http://www.netapp.com/tech_library/ftp/3356.pdf


Command-line interface: From the command line: sysstat –u <time interval> has a “Disk util” column that shows the utilization percentage of only the busiest disk in the system. While this is helpful for quickly checking to see if there is a disk on the filer that is extremely busy, it is not very helpful for examining overall, aggregate disk utilization. To get a comprehensive view of overall disk performance in an aggregate, individual disk statistics counters must be gathered on a periodic basis. The recommended way to do this is via the Performance Advisor/Performance Visualization Tool packaged with DataFabric® Manager or DataFabric Manager Lite V3.0 or later. Figure 14 shows a custom view tracking disk transfers for all disks in a particular aggregate.

Figure 14) Monitoring disk with the Performance Advisor.

Additionally, the Performance Visualization tool provides an excellent way to monitor filer response times to reads and writes of any supported protocol. Figure 15 shows default views for response times from the filer.


Figure 15) Monitoring performance with Performance Advisor.

5. OTHER IMPORTANT SYSTEM LIMITATIONS TO REMEMBER While the number of disks is one of the most variable factors in determining a filer’s performance, it is important to remember that there are other key factors that can limit performance as well. In this section we cover the bandwidth available to each other component.

5.1 FC-ALs Each FC-AL running at 1Gb/sec has a theoretical line speed of 100MB/sec. Likewise, each 2Gb/sec FC-AL has 200MB/sec of bandwidth. In practice however, the maximum throughput of data that we see through any 1Gb/sec FC-AL is around 80–90MB/sec and around 180MB/sec for 2Gb/sec FC-AL. All of the F800 series and FAS900 series of storage appliances are capable of aggregate read rates that are higher than a single FC-AL (either 1Gb/sec or 2Gb/sec) can provide. It is important to keep in mind this FC-AL bandwidth limitation, because if you are loop limited, the number of disks you add to the system will no longer improve system performance. Table 4 shows the maximum number of active FC-ALs that each platform can have.


Table 4) Maximum active FC-ALs per platform.

Platform FAS270 FAS920 F880 FAS940 FAS960 FAS980

Maximum # of FC-ALs

1 8 6 8 8 8

It is also important to keep in mind that while a DS14 disk shelf may have two active loops at the same time, clustering filers reduces this down to one active FC-AL per disk shelf. For optimal placement of your FC-AL host bus adapters, always refer to the platform configuration guide.

5.2 Network Interfaces It is also important to keep in mind the bandwidth limitations of your network interfaces. All the disks in the world will not improve a filer’s throughput if your communication’s pipe to your clients is too small. While a 100Mb/sec Ethernet interface’s nominal speed is 12.5MB/sec, in practice each 100Mb/sec interface on the filer only achieves a maximum throughput of around 9–10MB/sec. Gigabit Ethernet interfaces on the filer are able to achieve near wire speed, around 120MB/sec. Bottlenecks other than NICs, FC-ALs, and disks can exist, but these are easily avoided by always referring to the system configuration guides on NOW. Always refer to the system configuration guide for optimal placement of adapter cards.

6. PROCESS FOR PROPER AGGREGATE SIZING This section will present a detailed process for determining an adequate number of disks for aggregates being configured, as well as the RAID groups that will comprise them. Several steps in this process will refer to a later section of the report that will provide more detail or background information, assisting the reader with any decisions that need to be made during that step. Please note that this is intended to be a generally applicable process, but may not include all factors that need to be considered when sizing aggregates, flexible volumes, and RAID groups. Step 1: Identify Applications Step 2: Determine General Requirements and Constraints In every aggregate-sizing exercise, there are a number of project requirements and constraints that will limit the number of valid sizing options. Knowing these upfront will help to ensure that all the important factors are considered during the sizing process. The


following is a list of the most critical pieces of information you will need to obtain before you attempt aggregate, RAID group, and volume sizing.

2.1 Identify capacity needs. Identify the capacity required for each application that you will configure. Rate of growth, rate of change, and retention of data will all play a role in this. Even though flexible volumes allow for thin provisioning, it is still critical to ask what the current applications’ capacity needs are and what they will be in two months, six months, etc.

2.2 Identify application I/O patterns. Look at the type of I/O that the application will

generate. Can it be classified as a random or sequential workload? This will be the largest determining factor in the number of disks you have in any aggregate.

2.3 Identify aggregate performance requirements. What are the performance requirements

of the applications? Does the workload generate I/O requests that are more random or sequential in nature? Can they be translated into either IOPS or MB/sec? Once each application has been categorized into either a sequential or random workload, section 2 will help determine how many disks you will need to meet the application’s performance needs.

2.4 Estimate data reliability needs. How sensitive is the application to potential data loss?

Section 3 will help determine an appropriate RAID group size, RAID type, and the appropriate FC-AL configurations to meet the application’s data reliability needs.

2.5 Sanity check configuration. What type of budgetary constraints is the project operating under? This most likely will be a determining factor among several configuration possibilities that meet all the application(s) performance needs. Quoter and E-configurator can present some examples of possible costs associated with different configurations. Use them liberally. Step 3: Determine Configuration That Satisfies All Requirements and Constraints The answers to the questions in step 2 can be used in the following process to derive a recommended configuration. As each substep in the process is presented and explained, we will apply the step to the following example sizing problem. Example sizing problem: A customer has a database that requires 500GB of space and has 2000 transactions per second. The workload is largely composed of small random I/Os, and they have a requirement of high reliability. The customer has also have stated that the database grows rather steadily (100GB every 12 months for the next three years), and a 10% Snapshot™ reserve is adequate for the company’s online data recovery needs. Given the answers from step 2, follow each of the following substeps:


3.1. Determine data disks required for capacity. For each type of disk offered on the platform being sized, calculate the minimum number of data disks required to meet the capacity requirements identified in step 2.1. Be sure to properly assess the amount of usable space required, considering Snapshot reserve, headroom for growth, free space for efficient WAFL space allocation, etc. Example: Since the customer will need 800GB for the database over the next three years, and since in general it is best not to let a filer volume grow beyond 80% full, 1000GB of usable space will be required. Assuming about 10% for snap reserve and 15% for other reserves and rightsizing, approximately 1250GB of raw capacity should be planned. The following table shows how many data disks would be required for 36GB, 72GB, and 144GB drives to meet this capacity requirement. Capacity Required 72GB Data

Disks 144GB Data Disks

300GB Data Disks

1000GB 500GB Current Data + 300GB Growth Allowance + 10% Snap Reserve + 5% Rightsizing + 10% WAFL reserve

14 7 4

3.2. Determine data disks required for aggregate performance. Hopefully, steps 2.2 and 2.3 provided some useful information about what kind of I/O the application will be generating and what level of data throughput the volume is expected to deliver. Use the data presented in section 2.1 to get a good estimate on the number of disks required to support the workload specified by the customer. If good information is not available (and often it is not), size for the worst-case scenario, which is typically a large number of small random I/Os. Section 4 covers individual disk drive performance characteristics, which will be helpful for determining the type of disk. Example: Since the application will be doing primarily small random read I/O with low millisecond response time requirements, use the guidelines from section 4 to determine which data disk type and quantity will deliver the minimum performance requirements. Since all disk drives within the same family have the same performance characteristics, only drive type and RPM are considered in this step—not individual drive capacity. Since the application will require 2,000 IOPS from the volume, the minimum number of data disks of the following type are: Throughput and Response Times Requirement

7,200 RPM Disks

10K RPM Disks

15K RPM Disks

2,000 small random read IOPS (with a sub-20ms average response time requirement)

58 20 11


3.3. Determine minimum data disks to satisfy both capacity and performance. Take the maximum of the disk totals in steps 3.1 and 3.2 to get the minimum number of disks required to satisfy both capacity and performance requirements. Example:

It will therefore be necessary to configure at least 58 250GB, 7,200 RPM disks; 14 144GB, 15K RPM disks; or 20 300GB, 10K RPM disks in order to meet both the capacity and throughput requirements for this application.

3.4. Determine RAID type and number of RAID groups required. Using the feedback on application data reliability requirements collected in step 2.4, decide what the appropriate RAID group size and RAID type should be. If the application has very high reliability requirements, consideration should be given to choosing a RAID-DP group size less than the default of 16 disks (14 data plus 2 parity), and the possibility of local synchronous mirroring should be considered. If the application has slightly lower reliability requirements and can tolerate a larger performance impact during degraded mode, a larger RAID-DP group size may be appropriate. In general, it is best to use the default RAID-DP RAID group size. Example: Since this application has a requirement for high reliability, RAID-DP groups of the default size are chosen. The table below shows one way to configure the required data drives for each drive type into a number of default-sized RAID-DP RAID groups in an aggregate. Note that having a partially populated RAID-DP group will not adversely affect performance, so there is no need to attempt to divide the data disks into RAID groups of even size. 250GB, 7,200

RPM Disks 144GB, 15K RPM Disks

300GB, 10K RPM Disks

Data Disks Required 58 14 20 RAID Group Breakdown

14+14+14+14+3 14 14+6

Parity Disks Required 10 2 4 Total Disks Required (Data+Parity) 68 16 24

3.5. Determine best drive option based on remaining factors (cost, footprint, etc.). Now

that the number of required drives of each type has been determined, it is time to choose which option works best for the stakeholders. The total drive cost, cost overhead for shelves to house the drives, and total rack space footprint are the most common

Factor 250GB, 7,200 RPM Disks



Capacity 8 14 7 Throughput/Response Time 58 11 20 Data Disks Required (Max) 58 14 20


considerations, though there may be others factors the come into play (for example, long-term part availability, proven track record, etc.). Work with the customer to determine which factors are going to play most heavily into the purchasing decision. Example: Assuming the following relative disk price model, and noting that each shelf will add costs, the following table summarizes the total costs involved to meet the customer needs. 250GB, 7200

RPM Disks 144GB, 15K RPM Disks


Total Disks Required 68 16 24 Drive Cost $68x $40x $59x Shelf Cost ($5X per) $25x $10x $10x Total Volume Cost $93x $50x $69x Clearly for this application, using 16 144GB, 15K RPM drives will provide the best value among the possible solutions that meet the application’s requirements. Step 4: Implement Solution While this process is generally applicable, real-life situations will rarely be as clear-cut as the example presented. Rely on your own experiences and feel free to consult with the NetApp Global Services team to determine how the process may need to be adapted for any given situation.


7. Conclusions Proper volume and RAID group sizing is essential when configuring a system for maximum performance and reliability. The data and sizing processes presented in this paper provide a baseline from which to create estimates for the minimum aggregate and RAID group configuration to support a given set of customer needs. As always, err on the side of caution and allow for a reasonable margin of error during the sizing exercise. http://www.netapp.com/tech_library/ftp/3356.pdf covers some statistical analysis of systems in the field and the potential benefits of moving to flexible volumes from a system with only traditional volumes. Some key things to remember:

The I/O patterns of the applications hosted by the filer will heavily influence the number of disks to use for each aggregate. Random I/O intensive applications require flexible volumes hosted on aggregates that are configured with a large number of disks, while applications that tend to generate sequential I/O usually require fewer disks.

Create the largest aggregate possible. When possible, try to maximize the number of disks in any aggregate, especially during the creation of the aggregate.

The default RAID-DP group size of 16 should not be changed unless you have a compelling reason to do so. The tables in section 3 should be referred to if you or the customer is considering changing the default RAID group size. The default RAID-DP group size of 16 affords a balance between data protection, availability, and cost while maintaining performance similar to that of volumes or aggregates composed of RAID4 RAID groups of 8 disks in size.

The number of RAID groups in a particular volume is largely irrelevant from a performance perspective. The number of RAID groups and RAID group size do however change the availability and average RAID reconstruct times for the volume. Refer to section 3 for more specifics on these implications.

Disk capacity has no significant impact on disk performance. Drives of different capacity but of the same RPM type (for example, 10K RPM Fibre Channel drives) tend to have the same performance characteristics. However, reconstruction time of a drive is directly related to capacity, since there is more data to reconstruct on a larger drive.

Drive type has a drastic impact on random read response times. Examine section 4.1 thoroughly to make sure the selected drive type is appropriate for an aggregate hosting performance-sensitive application data.

Know the type of I/O pattern. If the type of I/O patterns generated by the applications that will be hosted on the filer is unknown, err on the side of caution and select a configuration that will afford you the maximum number of disks allowed. If no data is available at all, and you must come up with a recommended configuration, three or four

http://www.netapp.com/tech_library/ftp/3356.pdf


fully populated DS14 shelves are typically an adequate configuration for general application use.

Be thorough. If a high degree of accuracy (say within 5–10% margin of error) is required in sizing a prospective storage appliance configuration, contact NetApp Global Services to do a preinstallation performance analysis of the site in question.

Use RAID-DP instead of RAID4. Always use RAID-DP on large aggregates, especially on systems that include ATA disk drives. RAID-DP offers more than 10,000 times the data protection of RAID4 and comes at almost no capacity or performance premium. RAID-DP does not require a license.

Use the default RAID group size. Default RAID group sizes have been carefully selected for each platform to balance RAID reconstruction times and usable capacity. Consult with a NetApp systems engineer before changing these values.

Increase aggregates in increments of RAID group sizes. For example, if the default RAID group size for the platform is 14, plan to add 14 disks to the aggregate at once instead of one or two disks at a time.

Stagger recurring activities such as Snapshot copies and SnapMirror® transfers. When possible, stagger recurring system activities such as FlexVol™ Snapshot schedules and SnapMirror transfers so that they start at different times.

Use homogeneous disk drive capacities within RAID groups. Write allocation in Data ONTAP is more efficient if all disks within a single RAID group are of the same capacity. Try to avoid single RAID groups consisting of mixed-capacity drives.

Avoid creating small aggregates. Try to create aggregates that contain at least one or two shelves of disk drives. Smaller aggregates can become disk-bound for even sequential workloads.

Separate aggregates for separate RPM drives. In order to fully realize the lower response times of 15,000 RPM drives, create separate aggregates for 15,000 RPM drives and 10,000 RPM drives.

Allow Data ONTAP to choose disks and adapters during aggregate creation. Because Data ONTAP automatically spreads aggregates across disk adapters, let Data ONTAP choose the member disks of an aggregate.

Perform trend analysis frequently. Software management tools such as DataFabric Manager 3.0.1 and the Performance Advisor can greatly assist in tracking resource utilization of hundreds of NetApp storage appliances. © 2005 Network Appliance, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the Network Appliance logo, DataFabric, NearStore, SnapMirror, and WAFL are registered trademarks and Network Appliance, Data ONTAP, FlexVol, NOW, RAID-DP, Snapshot, and The evolution of storage are trademarks of Network Appliance, Inc. in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.

Documents

Aggregate and RAID Group Sizing Guide: A Guide to ... · NetApp Confidential – For Distribution Under NDA only - 1 - Aggregate and RAID Group Sizing Guide: A Guide to Configuring