18
Caringo, Inc. – Q3 2011 1 WHITE PAPER Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing very serious limitations in RAID (Redundant Array of Independent Disks) turning the once dependable data resilience method into a problem waiting to happen in your data center. These limitations are leading to more frequent data loss events, unacceptably long rebuild times and constant management headaches. This paper will present why RAID: Increases your risk of data loss Impacts I/O performance Is expensive to upgrade Takes too long to rebuild Doesn’t always utilize capacity efficiently The solution is replication. Caringo has leveraged replication in an innovative way with a product called CAStor to create an efficient and economical way for businesses to meet their SLOs (Service Level Objectives). The technical aspects of the CAStor architecture coupled with discussing those aspects in comparison to RAID 5 and 6 will be analyzed. A solution will then be proposed, showing you how to protect your data from RAID. Introduction Over the last three decades, the IT world has advanced with innovative techniques to store, access, retain and protect information assets. Hardware has become more reliable, systems have become more resilient and overall performance has continued to accelerate. Technology innovation spans the entire compute topology as servers have become more powerful, network bandwidth has increased and storage capacity has grown exponentially with the additional attribute of commoditization fostering enhanced levels of system containment for the IT data center. One of the most valuable innovations occurred in the early 1980’s with the creation of RAID, which provides techniques to add resiliency and recoverability if data gets corrupted or disk failures occur. However, with current 2 TB capacity of HDDs, the legacy technology of RAID 5 and 6 is encountering a new set of challenges that will continue to become more problematic as drive sizes grow. The root of these problems is simple – as the capacity of a RAID group grows so do the risks. IT organizations with RAID deployments will be facing increasing risks of controller bottlenecks on read/write throughput, greater inefficiencies in capacity utilization, stressing disk drives at greater intensity levels causing increased failure rates, and ineffective rebuilds when attempting to recover data.

Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 1

WHITE PAPER

Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing very serious limitations in RAID (Redundant Array of Independent Disks) turning the once dependable data resilience method into a problem waiting to happen in your data center. These limitations are leading to more frequent data loss events, unacceptably long rebuild times and constant management headaches. This paper will present why RAID:

• Increases your risk of data loss • Impacts I/O performance • Is expensive to upgrade • Takes too long to rebuild • Doesn’t always utilize capacity efficiently

The solution is replication. Caringo has leveraged replication in an innovative way with a product called CAStor to create an efficient and economical way for businesses to meet their SLOs (Service Level Objectives). The technical aspects of the CAStor architecture coupled with discussing those aspects in comparison to RAID 5 and 6 will be analyzed. A solution will then be proposed, showing you how to protect your data from RAID. Introduction Over the last three decades, the IT world has advanced with innovative techniques to store, access, retain and protect information assets. Hardware has become more reliable, systems have become more resilient and overall performance has continued to accelerate. Technology innovation spans the entire compute topology as servers have become more powerful, network bandwidth has increased and storage capacity has grown exponentially with the additional attribute of commoditization fostering enhanced levels of system containment for the IT data center. One of the most valuable innovations occurred in the early 1980’s with the creation of RAID, which provides techniques to add resiliency and recoverability if data gets corrupted or disk failures occur. However, with current 2 TB capacity of HDDs, the legacy technology of RAID 5 and 6 is encountering a new set of challenges that will continue to become more problematic as drive sizes grow. The root of these problems is simple – as the capacity of a RAID group grows so do the risks. IT organizations with RAID deployments will be facing increasing risks of controller bottlenecks on read/write throughput, greater inefficiencies in capacity utilization, stressing disk drives at greater intensity levels causing increased failure rates, and ineffective rebuilds when attempting to recover data.

Page 2: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 2

WHITE PAPER

These RAID 5 and 6 limitations have caused organizations to search for viable alternatives to RAID. One popular example is Amazon utilizing object-based replication as the data resilience method for their cloud storage service, Amazon Simple Storage Service (S3). Multiple replicas on separate servers is also an approach that has been well proven in the field by large scale deployments at companies like Google1. Similarly Caringo CAStor software leverages replication utilizing standard x86 servers and JBOD (Just a Bunch of Disks) in a highly efficient architecture providing data resilience, availability and recoverability not achievable by RAID. Raw Capacity Efficiencies The CAStor architecture is built around intelligent storage clusters that accelerate performance while providing high availability to the data assets through replication. Replication (e.g. RAID1) historically has been viewed as a very effective high-end solution that brought with it an expensive price tag thus making it prohibitive for most IT organizations to implement universally within the data center and across distributed company locations. Let’s evaluate the IT considerations of replication: Advantages of CAStor-based Replication

• Higher level of data protection since it is a direct mirror

• Better data availability by accessing content via multiple, redundant paths

• More relevant data protection by allowing data protection levels to be specified

all the way down to a “per object” basis. Important data can be protected at higher levels by increasing replica counts. Protection levels can also be automatically changed over time by setting object metadata which CAStor uses to automatically change the number of replicas over time (e.g. after 6 months change from 3 reps to 2)

• Better protection against corruption since CAStor continually checksums all

objects in the background and re-replicates if corruption of the object or its metadata is found.

Disadvantages of Replication

• Replication is perceived to require 2X the raw storage capacity since two copies are maintained versus a RAID 5 or 6 implementation where it is just the addition of 1 or 2 parity drives. This was true with RAID1, however the techniques that Caringo uses to optimize object storage end up being as efficient with raw capacity as a conventional RAID implementation as the following model and illustration depicts. (See Diagram 1: Raw Capacity Efficiency)

1 Whitepaper Reference: “Failure Trends in a Large Disk Drive Population”

Page 3: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 3

WHITE PAPER

Diagram 1: Raw Capacity Efficiency

CAStor Data Protection Using a reference configuration of 1PB of raw storage capacity in a CAStor cluster, replication will account for a loss of 50% of the raw capacity to store the 2nd copy of the data. CAStor File Namespace Additionally, even though CAStor maintains an in-memory hash table for file lookups during operation, those locations must also be persisted on disk to survive power cycles. There is a configurable amount of disk that can be reserved for this journal. The amount of disk space required for this journal is directly proportional to the average size of objects being stored. Large files (which will have a lower file count per disk) will typically take a 1% utilization setting for journals while small files will take a 3% setting. CAStor Performance Reserve CAStor reserves a nominal amount of space on the disk in order to perform defragmentation of the disk. This reserve is typically only a fraction of a percent of total drive space, which is negligible. RAID 5 or 6 Data Protection When looking at RAID 5 where the RAID groups contain 5 data drives and one parity drive (5+1), 17% of the total capacity will be used for parity (i.e. 1 in every 6 disks). With a configuration of RAID 6 environment, even when creating a larger RAID group, (e.g. 8+2) the additional parity drive increases the capacity utilized for data protection from 17% to 20% (i.e. 2 in every 10 drives). RAID File Namespace: After ext3, a standard Linux file system, is installed on a brand new 1TB drive, there is ~871GB of usable space which means the initial empty file system already takes up 13% of the capacity. With tuning, this can potentially be reduced to about 9%. As files and directories are added, additional usable space will be consumed due to the need to track

Page 4: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 4

WHITE PAPER

all the block pointers and metadata for the files. For very large files, the additional file namespace overhead may be less than 1%. However, for small files the additional overhead can consume up to 18% or more additional disk space. Most installations fall somewhere in between. There will also be usable space lost to internal fragmentation (e.g. the loss of usable space at the end of the last block of each file or indirect data block). On average, each file will lose ½ the data block size of the file system. So the total disk space used by a traditional file system’s journaling, indexing, inodes, metadata, etc. will range from 9-27% of raw capacity. Performance Reserve: Ext3 file systems will rapidly begin degrading in performance once the capacity utilization starts to exceed approximately 90%. Best practices for ext3 file systems are typically to maintain them at <90% of full accounting while maintaining a 10% reserve to avoid significant performance degradation. Hot Spare: A RAID system will normally recommend additional drives in the enclosure that are unused and available for “hot spares” if a RAID rebuild is required. Assuming a hot spare for every 15 drives in use, there will be another 7% of total raw disk spaced consumed by the hot spares. Takeaway • CAStor has about the same utilization of raw capacity, as does an ext3 file system

over RAID5.

• For large files (e.g. 100MB average size) where an ext3 file system’s overhead is lowest, the ext3+RAID5 system will be ~12% more efficient than CAStor in its ability to use RAW disk space. As disk drive sizes increase, the need to track more files on a RAID device will begin to increase the file system overhead and bring the system closer to the efficiency of CAStor.

• For small files (e.g. 100KB average size), CAStor will be 7% more efficient than RAID5 in its ability to use the RAW disk space. In a situation where ext3 runs out of inodes prior to being able to use all available disk space, CAStor will be substantially more efficient.

The result is that if the system workload is primarily small files, the CAStor environment will be more efficient and if the workload is primarily large files, the RAID 5 environment is more efficient. If it is somewhere in the middle of a mixed workload, then this is pretty much a wash. If this same scenario was a RAID 6 (8+2) environment, then the parity goes from 17% to 20% and that makes the Caringo solution even a more attractive alternative. The additional criteria presented in this whitepaper compare and contrast CAStor to RAID 5 and 6 and depict significant additional efficiencies of the CAStor solution.

Page 5: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 5

WHITE PAPER

Drive I/O Efficiency Efficiency of data reads/writes is a key factor to the overall performance of a storage system. In the block-based world of file systems, IOPS (Input/Output Operations per Second) is a common measurement. However, since CAStor is an object-based system that does not partition drives into fixed “blocks”, IOPS is not a valid comparison. A more appropriate measurement technique is to review the I/O profile of each drive (i.e. how many files/objects will any one drive be required to support while sustaining I/O concurrency). Regardless of how it’s measured, one thing is certain - as applications become more read/write intensive, the efficiency of disk drive I/O operations become more significant. The efficiencies are more than just the drive technology itself (SSD, SAS, SATA) but also how reads and writes are buffered, sequenced and how data is stored on disk for access. The workload modeled in this profile (See Diagram 2: Drive I/O Efficiency) reflects the following enterprise IT customer scenario:

• 2.5PB raw capacity (1.225PB usable for CAStor cluster) • 1.92PB raw capacity (1.225PB usable for the RAID 5 and 6 systems) • 50,000 concurrent writes of 100MB files • 10,000 concurrent reads of 100MB files • Replication of all new content to a Disaster Recover (DR) site with CAStor

CAStor configuration: • Using 2U/12 servers with 2TB SATA drives,

a 2.5PB cluster requires 105 servers • 105 servers x 12 drives in each server =

1260 disk drives in the CAStor cluster Write workload: • 50,000 writes / 1260 drives => each drive

must support ~40 writes concurrently • CAStor stores 2 replicas so the write count

is effectively doubled => 80 writes per drive • 40 + 40 = 80 writes per drive Read workload: • 10,000 reads / 1260 drives => each drive

must support ~8 reads concurrently • Due to local replication of the 2nd replica,

each drive must support reading all recently written files => +40 more reads / drive

• With the requirements to replicate all content to a DR site, each object must be read again; with 2 reps of each object => +20 more reads per drive

• 8 + 40 + 20 = 68 reads per drive

Diagram 2-A: CAStor Drive IOPS Efficiency

Page 6: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 6

WHITE PAPER

RAID5 configuration: • 2.5PB raw of CAStor * 49% =

1.225PB usable • When computing required RAW

capacity for the RAID5 solution the capacity of the hot spares must be removed since they do not support the active workload. So without hot spares (3%), the RAID5 solution has an efficiency of 64%

• 1.225PB usable at 64% efficiency requires 1.92PB RAW (1.225 / 64%)

• 1.92PB RAW grouped into 5+1 RAID5 groups => ~159 RAID groups (1.92PB per 12TB per RAID group)

• Configuration has 159 storage “devices” that must support the overall workload

Write workload: • 50,000 / 159 = ~315 writes per RAID5

group • Each write has 17% additional parity

bits => effective write rate of 315 * 1.17% = 369 writes

Read workload: • 10,000 / 159 = 63 reads per RAID5

group • To support replication to the DR site,

an additional read of all new writes => 369 additional reads

• 63 + 369 = 432 reads per RAID5 group

RAID6 configuration: • 1.225PB at 61% efficiency of RAID6

with large files will require 2PB raw capacity

• 2PBs in 8+2 RAID6 groups => 100 RAID groups (2PB per 20TB per RAID group

• Includes 100 RAID6 storage devices supporting the workload

Write workload: • 50,000 per 100 = 500 writes per

RAID6 group • Each write must also write 20%

additional parity bits => effective write rate of 500 * 1.2% = 600 writes

Read workload: • 10,000 per 100 = 100 reads per

RAID6 group • To support replication to the DR site,

an additional read of all new writes => 600 additional reads

• 100 + 600 = 700 reads per RAID5 group

Diagram 2-B: RAID 5 and 6 Drive IOPS Efficiency

Page 7: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 7

WHITE PAPER

Takeaway This workload profile depicts the comparative efficiencies of the three solutions.

• In the CAStor solution, the drives will be concurrently reading/writing from 148 unique locations at a time (68 reads + 80 writes). Each drive will participate in far fewer concurrent I/O operations (~5-6x less than RAID5 and ~8-9x less than RAID 6).

• In the RAID 5 solution, the drives will be concurrently reading/writing from

801 unique locations at a time (432 reads + 369 writes). Each drive will be involved in a large number of I/O operations, although for a shorter period of time.

• In the RAID 6 solution, each drive will concurrently be supporting 600 writes

and 700 reads requiring the drive to perform I/O against 1,300 different locations on the drive simultaneously.

While CAStor is processing 148 concurrent read/write operations it buffers data from the concurrent writes. This buffer is then optimized when flushed to disk to ensure the most optimal seek pattern by the drive heads. CAStor also utilizes contiguous disk regions to store objects. Since each object is stored in its entirety in a contiguous section of disk, this translates to greater efficiencies on reads. In RAID 5 or 6 environments, the individual blocks being used to store the data from the large number of concurrent writes are interleaved as well as being spread across the drives in the RAID group. The will inherently create substantial fragmentation which in turn causes significant inefficiencies for reads. Additionally, by storing whole objects, CAStor can be much more efficient when reclaiming space freed up from deleted objects. CAStor’s background defragmentation process is continually evaluating each drive and will move all free space to the end of the drive when appropriate. This ensures that even after 5 years of write/delete cycles, a CAStor drive will perform the same as a brand new drive. In an ext3 system, substantial defragmentation can occur and no defragmentation tools are supplied with ext3. The improved I/O efficiency of the CAStor solution not only accelerates performance but also translates to higher levels of disk drive endurance. As a hard drive becomes more and more fragmented it loses efficiency because it is forced to spend more time repositioning its read/write heads to get to the right tracks and sectors. Defrag routines reorganize the data but put additional stress on the drive and is a memory intensive operation degrading performance during the process. File Reference Efficiency An advantage of an object-based storage system is the simplicity of tracking where data resides for a file.

Page 8: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 8

WHITE PAPER

Diagram 3 – Reference Efficiency

CAStor Each CAStor node maintains an in-memory hash table with the location of every object on that node. A request for an object from a CAStor node requires a single in-memory hash table lookup to determine the location of an object. Since CAStor stores objects as whole objects and not as a chain of blocks, the first disk seek is to the first byte of actual data. CAStor requires ZERO disk accesses to find a file. (See Diagram 3) Ext3 + RAID 5 An ext3 (or ext4) file system divides the entire disk into fixed sized blocks. The default is 4KB blocks, which has a maximum device size of 8TB. The 10TB of usable space in the 5+1 RAID group would actually require an 8KB block size. This is due to the fact that ext3 uses a signed 32-bit integer to address each block, which limits the number of blocks that can be referenced. An ext3 file system also uses inodes, which are data structures used to track directories and files. Each directory or file uses one inode. The inode is responsible for tracking all the 32-bit integer pointers to the data block(s) that chain together to make up the contents of a directory or file. A portion of the disk is reserved for inodes during formatting and the number of inodes is fixed when formatting occurs. If there are numerous small files, it is possible to run out of inodes prior to using the available disk blocks. It is not possible to increase the number of inodes without re-formatting. In order to access a file on ext3, the root directory inode must be read. This inode contains one or more pointers to data blocks. These data blocks contain the metadata for the sub-directories and files contained in the root directory. For each sub-directory in the path of a file, the ext3 system must read the inode for the sub-directory. Once it has read the inode, it can determine the pointer to the data

Page 9: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 9

WHITE PAPER

block(s) that contain the metadata for each sub-directory. The data blocks must be read from disk until the next sub-directory is located. The inode for the next sub-directory is retrieved from the metadata and the loop repeats. Eventually, the ext3 system will reach the sub-directory containing the actual file of interest. The data blocks(s) for that sub-directory will be read and parsed until the desired file is located. Finally, the inode number for the desired file is read from the metadata. Next, using the file’s inode #, the ext3 system will read the contents of the desired file’s inode. An inode contains up to 12 32-bit integers pointing to data blocks.

• In an 8KB-block file system, if a file is <96KB, then the slots in the inode structure to track block pointers are sufficient to track all the blocks that chain together to make up the file.

• For larger files, the 13th, 14th & 15th slots of an inode point to progressively greater levels of data blocks that can each contain more 32-bit integer pointers to data blocks.

• The 13th slot points to an “indirect block” – an 8KB data block that itself is used to track up to 2048 more 32-bit integer pointers.

• The 14th slot points to a “double-indirect block” which has 2048 entries that point to 8KB data blocks which each in turn point to 2048 more 8KB data blocks that each contain 2048 32-bit integer pointers to 8KB data blocks that contain file content.

• The 15th slot points to a “triple-indirect block” which adds yet one more level of indirection – a block that point to blocks that point to blocks that contain the pointers to the blocks that contain the file content.

For a 100MB file on an ext3 file system using 8KB blocks, 12,500 data blocks are required. Therefore, the ext3 system needs to maintain a list of 12,500 32-bit integer pointers. Tracking 12,500 data block pointers will require:

• All 12 direct references from the file’s inode (tracks 96KB of file content) • An indirect block that references 2048 more pointers (tracks 16.384MB of file

content) • A double indirect block that references 12 data blocks that track 10,440 more

pointers to reference the remaining file contents. Once ext3 has read all the blocks from disk needed to assemble a file’s list of 32-bit pointers, it can then begin reading the data. And, on a RAID5+1 group this data will be scattered across all 5 data drives with an average of 2,500 8KB blocks coming from each drive. Assuming a best-case scenario of the file being at the top of the root directory, there will need to be 10 reads of data from the disk. If there is a substantial directory tree that must be walked, the number of disk operations required to locate the file can increase significantly. On average, however, the number of reads will be significantly greater due to most files being nested in 5 or more directories. Takeaway RAID systems experience greater levels of I/O inefficiency as the size of the total system grows. By only tracking a single in-memory pointer vs. 12,500 disk-stored pointers, CAStor provides a substantial advantage in I/O and operational efficiency.

Page 10: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 10

WHITE PAPER

Failed Drive Recovery Efficiency This section compares the efficiency with which data is recovered following the failure of a drive. Enterprise SATA drives offer the highest capacity and lowest cost compared to SAS drives making them the most popular drive type because of the value they provide. However, as capacities have increased the reliability of the drives, expressed as the Unrecoverable Read Error (URE), has stayed the same. The URE of SATA drives is typically 1 in 1015 bits which means that for every 125TB of data read from the drive you can expect a failure reading a single bit. That doesn’t sound like a problem, unless that single bit read failure occurs during a rebuild resulting in a rebuild failure. Let’s take a look at what this means for recovery with today’s 2TB disk drives and the impact 4TB disk drives will have in the future.

• With a 6 drive RAID group of 2TB drives 12TB of “bits” will need to be read during the rebuild. This means that there is an 8% chance of encountering a URE during a rebuild resulting in a rebuild failure.

• With a 6 drive RAID group of 4TB drives the chance of a rebuild failure goes up to 17%

This rebuild failure rate will improve with SAS drives as their bit error rate is 1016 but the same ratio will apply as areal densities continue to double every 2 to 4 years. This exponential growth in drive capacity will have greater and greater impacts in RAID 5 and 6 environments as the rebuilds create an amplified risk of unrecoverable data. There is also a strong correlation between drive failures with age and the specific batch from the manufacturer. Since drives from the same batch are typically in the same RAID array, a drive failure creates a higher risk toward the next drive failure occurring within the RAID group in that same array. The CAStor architecture automatically replicates objects to a second node, which creates a highly resilient topology and totally mitigates this type of RAID scenario risk. CAStor In an example taken from a real-world system requirement for a 2.5PB system, there will be 105 2U/12-drive servers. That’s 1260 individual 2TB disk drives. If a drive fails, all nodes will be notified about it immediately. All nodes know which objects on their drives were also on the failed drive. Every single node will immediately begin replicating these objects onto other nodes/drives in the cluster. So when a drive fails in a CAStor cluster, all nodes/drives participate in the recovery. The larger the CAStor cluster, the faster recovery can happen.

For 100MB files & 2TB drives:

• Assuming that the drive is 100% full, there can be at most 20,000 files on the

drive.

Page 11: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 11

WHITE PAPER

• If there is a perfect distribution of the 2nd replica of all those files on the remaining 1259 drives, there will be ~16 files on each of the remaining disk drives.

• If there is a 2nd drive failure, there will be a maximum of 16 files lost.

• If the recovery was halfway done before the 2nd drive failure, 8 of those files would have already been re-replicated and there would be a max of 8 files lost.

• If the cluster were only 50% full, it would be 4 files lost, respectively, since CAStor is content-aware and only re-replicates actual content.

With CAStor, even when multiple drives fail concurrently, it is virtually impossible to lose a significant amount of data and in this example it would be an average of 4 files lost. (See Diagram 4: Failed Drive Recovery Efficiency) Additionally, levels of protection can set on a per-object basis level enabling the loss rate for critical data to zero. Ext3 + RAID 5 When a drive fails, 6 other drives participate in the recovery (the original 5 still in the RAID group plus the hot spare). Regardless of % full, the entire drive must be rebuilt, bit-by-bit.

For 100MB files & 2TB drives:

• Assuming the 5+1 RAID group is 100% full, there can be at most 78,000 files on

the device (12TB raw * 65% efficiency (3% credited back because hot spares shouldn’t be figured into the 5+1 efficiency).

• If the rebuild is 1% complete and a 2nd drive fails of the six, 78,000 files will be lost.

• If the rebuild is 0-99% complete and a 2nd drive fails, 78,000 files will be lost. Ext3 + RAID 6

• When a drive fails, 10 drives participate in a RAID6 rebuild.

• Because of the larger RAID6 group, a rebuild failure will lose 122,000 files.

• If the rebuild is 0-99% complete and a 3rd drive fails, 122,000 files will be lost.

Page 12: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 12

WHITE PAPER

Diagram 4: Failed Drive Recovery Efficiency

Takeaway The magnitude of the risk of data loss and the severity of the recovery process is significantly different in this scenario. CAStor would have 4 lost files and the RAID 5 system would have 78,000 files lost while the RAID 6 system would have 122,000 files lost if there is a 3rd drive failure during a rebuild. The impact on the IT operations staff for recovery operations and the impact on meeting the SLAs set by the organization are night and day when comparing CAStor to RAID in this scenario. RAID systems were never designed to efficiently address multi-PB configurations. The larger the configuration gets, the more impressive is the efficiency of CAStor. Annual Failure Rate of Drives and Lifecycle Management The Google study of a 100,000 plus drive configuration (See Diagram 5) depicts that as disk drives age, their failure rates increase significantly over a 5-year horizon. As new disk drive innovations come to market, their capacity levels continue to double every 2 – 4 years. When a disk drive needs to be replaced, the question arises on whether the storage solution can incorporate the latest disk drive product available in the marketplace at that time.

Diagram 5: Disk Drive Annual Failure Rate Note: Annualized failure rates broken down by age groups. Google Services Study representing 100K+ total drives.

Page 13: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 13

WHITE PAPER

Diagram 6: Loss Protection and Failure Recovery CAStor The CAStor architecture continuously checks the viability and integrity of the data stored and maintains the efficiency of the system. When a drive fails it can be replaced with the latest and greatest capacity available in the marketplace. The new drive is automatically added to the cluster and dynamically added to the storage pool with no need to ever provision the added capacity as the full capacity is instantly virtualized and available for use. The customer gets a system that is always advancing to new levels of performance, capacity and availability at the most attractive price points available at that point in time. Since disk drive prices average a 2% decline per quarter, taking advantage of new higher capacity drives can provide a significant payback ($/GB) over the life of the total investment. Ext3 RAID 5 and 6 A RAID environment requires that a failed drive be replaced with an identically sized drive until the entire RAID group is retired. The RAID group incorporates storage provisioning with the need to operationally manage the total capacity available to a specific RAID set. RAID customers have the following options to take advantage of new drive technology:

1. Do a forklift upgrade of all the drives in the array to take advantage of the new drive technology and incur the significant cost write-off of the drives from the customer’s depreciation schedule

2. Do a subset upgrade by purchasing a new set of the latest drives just for a new RAID group in the array and keep the old drives as spares for the legacy part of the configuration

3. Wait several years before taking advantage of the new disk drive technology based on the old drives being fully depreciated

Page 14: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 14

WHITE PAPER

Since most storage investments have a 3 to 5 year amortization for the capital assets, most RAID customers will be financially penalized by the inability to easily cost justify the integration of these new drives into their existing storage investment.

Takeaway IT organizations need to evaluate the Total Cost of Ownership (TCO) of their storage investment over the life of the entire system. With RAID 5 and 6 systems, incorporating the latest disk drive technology comes at a price tag that is significant versus a CAStor system that provides the customer the flexibility of choice to continuously enhance the storage system at the continuing price reductions of drives in the marketplace.

Assuming a configuration representing a 1PB system with 1TB drives containing 1,000 drives and the drive failure rates from the Google Study: (See Diagram 6)

• Year 1: 1.7% drive failure => 17 1TB drives replaced with 2TB drives => +17TB increase (17 * 1TB delta)

• Year 2: 8% drive failure rate => 80 1TB drives replaced with 3TB drives

=>+160TB increase (80 * 2TB delta) • Year 3: 8.6% drive failure rate => 86 1TB drives replaced with 4TB drives

=> +258TB increase (86 * 3TB delta)

So, in 3 years, the cluster will have grown from 1PB to 1.435PB – a 43.5% increase in capacity. When looking at the cost to grow that added capacity, there is on average a 25% premium price addition for the added capacity. So it becomes very cost effective for CAStor customers to replace failed drives with higher capacity drives and grow the initial configuration without even having to consider expansion of the total number of drives.

Even in configurations that are exclusively large file systems where RAID has a small capacity advantage over CAStor, this granular drive level upgradeability feature of CAStor completely eliminates any potential advantage of RAID over a CAStor replication environment. Hardware upgrade and add-on efficiencies Customers want to buy storage based on what they need today and avoid overbuying at any point in time. Customers are attracted to modular incremental storage architectures not only for financial cost savings but also to take advantage of the latest drive technology with accelerated speeds, increased capacity and higher reliability. As enterprise storage needs continue to grow on average at 60% per year, the customer benefits by storage systems that enable add-ons of the latest drive technology on the market. CAStor dynamically allows the customer complete flexibility of adding the higher capacity drives and dynamically use that capacity without any downtime or provisioning ever required.

Page 15: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 15

WHITE PAPER

Diagram 8: Hardware updates and add-on efficiencies

CAStor Customers can take advantage of the >40% areal density increases on an annual basis which enables pricing advantages on higher capacity disk drives. The customer can expand their storage configuration at the granular level of a single drive or an entire storage node of any size providing maximum flexibility in upgrade strategies.

• Upgrade a single node (or even a drive) at time • Increase capacity by a single node at a time

RAID 5 and 6 Customer expansion of new drive technology is on a RAID group or at an entire subsystem level, which creates large capital expenditures to grow.

• Upgrade 96 drives + 2 servers at a time • Increase capacity 96 drives at a time

Takeaway Companies can better optimize IT budgets with CAStor by only buying the storage required at a point in time and gaining the price and capacity advantages of the latest storage available when expansion is needed. Data loss protection and failure recovery RAID systems can build high levels of availability into the design with active/active controllers, dual power supplies, dual fans, dual porting drives and cross cabling to achieve no single point of failure in the array. But single points of failure need to transcend more than just a storage array to achieve close to 100% data availability. Customers looking to implement an overall business continuity strategy need to take into account a disaster that can seize an entire data center location.

Page 16: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 16

WHITE PAPER

Diagram 9: Data Loss Protection and Failure Recovery CAStor CAStor provides protection against multiple disk drive failures and the granularity and flexibility to set different protection levels (ie. multiple replicas) on a file basis with policies that can change the number of copies based on the frequency of access or based on a lifecycle timeframe to meet a desired Service Level Objective (SLO). CAStor also provides cluster failover for recovery of any single point of failure including recovery from failures of server nodes, NICs and other hardware infrastructure beyond just the drives and storage array. In a transparent manner, CAStor will automatically shift workloads to distributed clusters in the network. RAID 5 and 6 Alternatively RAID 5 and 6 systems are significantly more limited in scope and only provide the protection from a single or dual drive failure. If the power goes down across an entire data center, although the RAID array might have dual power supplies, the entire array will be compromised and unavailable based on the power hit to that site. Takeaway Replication in a clustered storage topology as offered with CAStor broadens the level of overall data protection beyond purely drive failure exposure. Conclusions RAID systems today experience a global challenge in meeting customer expectations of high availability, high performance and cost effectiveness based on the growing TB capacities of disk drives and PB configurations of storage solutions. CAStor can mitigate these challenges and more effectively address the needs for an optimal storage system. In summary, this whitepaper evaluated RAID and CAStor for the following 5 criteria and CAStor wins on all counts:

Page 17: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 17

WHITE PAPER

1. Data Loss Risks

RAID: When dealing with Unrecoverable Read Errors (URE), the I/O and file system inefficiencies in RAID systems increase the risk of greater data loss as the number of drives and total capacity of the configuration expand. CAStor: In a PB configuration where tens of thousands of files are at risk with RAID, you can count on one hand the number of files at risk with CAStor. Replication provides increased protection from loss with the added resiliency when replicas are deployed at remote locations. The amount of files at risk during a CAStor rebuild is a fraction of what is at risk in a RAID group.

2. I/O Performance RAID: Efficiency of data reads/writes is a key factor to the overall performance of a storage system and in a RAID environment each drive will be involved in a large # of I/O operations and I/O and file system inefficiencies creates fragmentation. CAStor: When storing fixed content in a CAStor object system, each drive will participate in far fewer concurrent I/O operations(~5-6x less than RAID5 and ~8-9x less than RAID6). Furthermore, CAStor will never encounter file fragmentation issues since files are stored as whole objects.

3. Costs to Upgrade RAID: RAID systems need all the drives to be of the same technology so over time the storage system just stays the same with no added capacity or performance. The customer experiences an expensive forklift upgrade just to take advantage of new drive technology during the life of the system. New storage technology is not only for added capacity but the existing RAID system can’t easily take advantage of the new generation of hybrid drives (combining HDD with Flash) which could enable accelerated throughput and higher performance for the storage system. CAStor: With the granular level of upgrades or failed drive replacements down to a single drive, CAStor configurations can be modularly enhanced over time with the latest drive technology during the life of the storage system for investment protection.

4. Rebuild Effectiveness RAID: RAID was never initially envisioned to address the magnitude of rebuilding with multi-terabyte disk drives. As the configuration grows over time, so do the costs to achieve desired SLAs for overall throughput and data protection. CAStor: As the CAStor cluster grows, the speed to restore accelerates. The costs for growing the configuration get better with time as the cost of industry standard server nodes and disk drives continue to go down annually.

Page 18: Protect your data from RAID · Protect Your Data from RAID July 2011 Abstract The explosive growth of fixed content combined with the growth of hard disk drive (HDD) capacity is exposing

Caringo, Inc. – Q3 2011 18

WHITE PAPER

5. Efficient Utilization of Capacity RAID: On the surface, RAID looks like an efficient way to optimize capacity while providing increased data protection but the penalties for file namespace, performance reserve, parity and hot spares are significant. CAStor: When looking at typical mixed workloads of small and large files, replication with a CAStor cluster provides equal capacity optimization to RAID as a baseline but delivers far greater capacity efficiencies when considering failed drive replacement and modularly growing the configuration over the life of the storage system.

A growing number of enterprise customers are considering RAID 6 investments to overcome the greater risks of data loss with RAID 5 but the result of a RAID 6 alternative is less capacity utilization based on the added parity drives, greater drive IO inefficiencies, greater number of files at risk in a rebuild if more than 2 drives fail in the RAID group and higher cost of the overall storage configuration. In conclusion, if you are using RAID 5 or 6 as a data resilience method it is only a matter of time before you experience data loss (if you haven’t already). Replication will ensure the resilience and recoverability of your data for the full life cycle of your applications. The CAStor storage architecture delivers the value of replication without compromising capacity when compared to RAID alternatives. CAStor customer environments reap the benefits of future proofing of their storage investment spanning the entire life of the system for a superior TCO experience. Get started today with a fully functional 2TB CAStor license for free. For download instructions or more detailed information contact [email protected] or visit http://www.caringo.com.