29
1 Storage Devices Why is storage important? Web 2.0 applications are an extension of your Desktop SaaS (Software as a service) is here and growing Broadband is a reality Storage costs are dropping Everyone expects near-unlimited storage online – Youtube, Flickr, Facebook et al are storing your life online* (.. And yea … let’s not forget your personal bit-torrent collection) * it would take 1400 TB to store your entire life in video. 5700 TB if you want to know what was happening around you. Another 73 TB for the audio files of everything you heard (MP3 quality). That’s about 6000 TB for a copy of your life Agenda: Hard disks SATA ,SAS,FC (Fiber Channel), Solid state RAID DAS SAN NAS SATA (Serial advanced Technology Attachment): It is a computer bus interface for connecting host bus adapter (connect the host to other n/w & storage devices) to mass storage devices such as hard drives and optical drives. Advantage

Storage Lecture Notes

  • Upload
    g4u

  • View
    167

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Storage Lecture Notes

1

Storage Devices

Why is storage important?

• Web 2.0 applications are an extension of your Desktop

• SaaS (Software as a service) is here and growing

• Broadband is a reality

• Storage costs are dropping

• Everyone expects near-unlimited storage online – Youtube, Flickr, Facebook et al are storing your life online*

• (.. And yea … let’s not forget your personal bit-torrent collection)

* it would take 1400 TB to store your entire life in video. 5700 TB if you want to know what was happening around you. Another 73 TB for the audio files of everything you heard (MP3 quality). That’s about 6000 TB for a copy of your life

Agenda:

• Hard disks

SATA ,SAS,FC (Fiber Channel), Solid state

• RAID

• DAS

• SAN

• NAS

SATA (Serial advanced Technology Attachment):

• It is a computer bus interface for connecting host bus adapter (connect the host to other n/w & storage devices) to mass storage devices such as hard drives and optical drives.

• Advantage

reduce the cable-bulk and cost

Faster and more efficient data transfer

Page 2: Storage Lecture Notes

2

SAS (Serial Attached SCSI):

• It is a computer bus used to move data to and from computer storage devices such as

hard drives & tape drives

SCSI (Small Computer System Interface):

• It is a set of standards for physically connecting and transforming data between computers and peripheral devices.

FC (Fiber Channel):

• It is a gigabit Speed n/w technology primarily used for storage networking.

primarily – supercomputer

Now - SAN (Standard connection type)

Choosing your Hard Disk

(SATA, FC, SAS, SCSI, Solidstate)

Introduction to Hard Drives:

• Basic physical storage unit (aka Physical block device)

• Variables to consider when selecting a drive

Type (SAS, SATA, FC)

RPM (revolutions per minute)

Capacity

MTBF (Mean Time between Failures)

Life Expectancy

Page 3: Storage Lecture Notes

3

SATA

(Serial ATA)

SAS

(Serial Attached SCSI)

FC

(Fibre Channel)

Typical Use low-cost, high-volume, low-speed, large-storage environments

CDP / Backups

Replacement for SCSI

High performance transaction oriented applications with high IOPs requirement

High performance transaction oriented applications with high IOPs requirement

Performance Average

Typically 7200 RPM

Good (Similar to FC)

10k / 15k RPM

Good (Similar to SAS)

10k / 15k RPM

Hard drive capacities

Typically - 250 GB, 500 GB, 750 GB, 1TB

Typically – 73 GB, 146 GB, 300 GB, 400 GB

Typically – 73 GB, 146 GB, 300 GB, 400 GB

Hard Disk types:

SATA

(Serial ATA)

SAS

(Serial Attached SCSI)

FC

(Fibre Channel)

Price per Gig

(based on max drive capacity retail web price)

$ 0.33 $2 $3

Miscellaneous - • Backward compatible with SATA

• Allows mixing SATA drives on same backplane

-

Hard Disk Conclusions:

Page 4: Storage Lecture Notes

4

• For high IOPs, database applications, low-storage requirements – you have a choice between FC and SAS

• SAS currently seems like the better option

• Future SAS standards promise to be faster than FC (though it is likely they may remain neck to neck)

• For high-storage requirements (video server, file servers, photo storage, archivals, mail servers, backup servers) SATA is the way to go

• One may combine SAS and SATA to reduce average cost and achieve your goals – especially since the backplanes are cross-compatible

• Readup the spec sheet of the hard drives you plan on using for determining specifics

Solid State Drives:

• Uses solid state memory to store persistent data

• Eliminates mechanical parts

• Useful for creating efficient in-between caches or storing small to mid-sized high performance databases

Page 5: Storage Lecture Notes

5

Solid State Drives:

Advantages Disadvantages

• Faster startup – no spinning

• Significantly faster on Random IO (From 250x to 1000x+)

• Extremely low latency (25x to 200x better)

• No noise

• Lower power consumption

• Lesser heat production

• Significantly more expensive ($10-30/GB for Flash based, $100-200/GB for DDR RAM based)

• Slightly slower on large sequential reads

• Slower random write speeds incase of Flash based storage

RAID Primer

redundant array of inexpensive disks or redundant array of independent disks

(0, 1, 2, 3, 4, 5, 6, TP, 0+1, 10, 50, 60)

Introduction to RAID:

• allows multiple disks to appear as a single contiguous physical block device

• This uses a pool of disks to save data. Rather than spend billions building special high-capacity disks, greater capacity is achieved by simply putting PC disks into RAIDs.

• Typically we need a RAID controller on the host

• provides redundancy / high availability

• A raid group appears as a single physical block device

• There are several types/levels of RAID

RAID 0:

Page 6: Storage Lecture Notes

6

• RAID 0 writes blocks to multiple disks without redundancy

• Because the data is being written to multiple disks the controller can work in parallel on both read and write, improving performance

• If any error occurs data can be lost

• Don’t use on mission critical data; only for performance

• Ideally you have one drive per controller

RAID 1:

• This is mirroring. The same data is written to two disks. If either disk fails a complete copy of the data is available at the other disk

• Uses 2X the storage space, can get better performance because the OS can pick the disk with the least seek or rotational latency

RAID-5:

Page 7: Storage Lecture Notes

7

• RAID5 uses “parity” or redundant information. If a block fails, enough parity information is available to recover the data

• The parity information is spread across all the disks

• High read rate, medium write rate

• A disk failure requires a rebuild as the parity information is used to re-create the data lost

RAID-10:

• RAID-10 is striping plus mirroring, so you get good performance plus a fully mirrored data, at the expense of 2X disk

Storage:

• RAID-5 is a reasonable choice most of the time.

• There are many commodity vendors of RAID arrays

• SCSI RAID arrays are expensive, the disks are expensive, and the disks have low capacity, but the RAID arrays have good performance

• ATA raid arrays have excellent price (1/3-1/2 that of SCSI drives) and capacity, somewhat lower performance

• Apple ATA RAID: 7 TB, $11.5K

• Promise Vtrak 15110: $4K plus 15 400GB SATA disks at $300 = 6 TB for $8,500

Comparison of Single RAID Levels:

Page 8: Storage Lecture Notes

8

RAID 0 RAID 1 RAID 5 RAID 6

Diagram

Description Striping Mirroring Striping with Parity Striping with Dual Parity

Minimum Disks

2 2 3 4

Maximum Disks

Controller Dependant

2 Controller Dependant

Controller Dependant

Array Capacity

No. of Drives x Drive Capacity

Drive Capacity

(No. of Drives - 1) x Drive Capacity

(No. of Drives - 2) x Drive Capacity

RAID 0

RAID 1 RAID 5 RAID 6

Storage Efficiency 100% 50% (Num of drives – 1) / Num of drives

(Num of drives – 2) / Num of drives

Fault Tolerance None 1 Drive failure 1 Drive failure 2 Drive failures

High Availability None Good Good Very Good

Degradation during rebuild

NA Slight degradation

Rebuilds very fast

High degradation

Slow Rebuild

(due to write penalty of parity)

Very High degradation

Very Slow Rebuild

(due to write penalty of dual parity)

RAID 0 RAID 1 RAID 5 RAID 6

Random Read Very Good Very Good Very Good

Page 9: Storage Lecture Notes

9

Performance Good

Random Write Performance

Very Good

Good (slightly worse than single drive)

Fair (Parity overhead)

Poor (Dual Parity Overhead)

Sequential Read Performance

Very Good

Fair Good Good

Sequential Write Performance

Very Good

Good Fair Fair

Cost Lowest High Moderate Moderate+

RAID 0 RAID 1 RAID 5 RAID 6

Use Case

Non critical data

High speed requirements

Data backed up elsewhere

Typically used as RAID 10 in OLTP / OLAP applications

Non-write intensive OLTP applications / file servers etc

Non-write intensive OLTP applications / file servers etc

Misc - - Parity can considerably slow down system

Not supported on all RAID cards

Comparison of Nested RAID Levels:

RAID 10 RAID 50

Page 10: Storage Lecture Notes

10

Diagram

Description Mirroring then Striping Striping with Parity then Striping without parity

Minimum Disks

Even number > 4 > 6

Maximum Disks

Controller Dependant Controller Dependant

Array Capacity

(Size of Drive) * (Number of Drives ) / 2

(Size of Drive) * (No. of Drives In Each RAID 5 Set - 1) * (No of RAID 5 Sets)

RAID 10 RAID 50

Storage Efficiency 50% ((No. of Drives In Each RAID 5 Set - 1) / No. of Drives In Each RAID 5 Set)

Fault Tolerance Multiple drive failure as long as 2 drives from same RAID 1 set do not fail

Multiple drive failure as long as 2 drives from same RAID 5 set do not fail

High Availability Excellent Excellent

Degradation during rebuild

Minor Moderate degradation

Slow Rebuild

(due to write penalty of parity)

RAID 10 RAID 50

Read Performance Very Good Very Good

Page 11: Storage Lecture Notes

11

Write Performance

Very Good Good

Use Case OLTP / OLAP applications Medium-write intensive OLTP / OLAP applications

Nested RAID Misc Notes:

• RAID 10 is faster and better than RAID 0+1 for the same cost

• RAID 60 is similar to RAID 50 except that the striped sets with parity contain dual parity

• Ideally RAID 10 and RAID 50 will be the only nested RAID levels you will use

RAID Considerations:

• Select your Stripe Size by empirical testing

smaller stripe size increases transfer performance, decreases positioning performance, and vice versa

ideal stripe sizes depend on your application, typical data read in a read, sequential vs random reads etc

• Try and select hard drives from separate production batches

• Maintain sufficient Spares in a large array (typically 1 per 10-15 disks is sufficient)

• Use Global spares across RAID groups if your controller supports it

RAID Considerations:

• Use hardware RAID unless performance is not a consideration

Especially nested RAID levels or parity based RAID – consume more CPU cycles and increase rebuild time if implemented in software

• Ensure the controller has battery backup to retain its cache in case of power failure

• For internal RAID Controller cards use faster PCI buses (PCI-x)

Storage Technologies :

Page 12: Storage Lecture Notes

12

• A secondary or tertiary storage may connect to a computer utilizing computer networks. This concept does not pertain to the primary storage, which is shared between multiple processors in a much lesser degree.

• Direct Attached Storage (DAS)-is a traditional mass storage, that does not use any network. This is still a most popular approach. This term was created lately, together with NAS and SAN.

• Network Attached Storage (NAS)-is mass storage attached to a computer which another computer can access at file level over a local area network, a private wide area network, or in the case of online file storage, over the Internet.

• Storage Area Network (SAN)-is a specialized network, that provides other computers with storage capacity. The crucial difference between NAS and SAN is the former presents and manages file systems to client computers, whilst the latter provides access at block-addressing (raw) level, leaving it to attaching systems to manage data or file systems within the provided capacity. SAN is commonly associated with Fibre Channel networks

Passive Disk Enclosure based Direct Attached Storage (PDE based DAS):

Passive Disk Enclosure based DAS:

• DAS – Direct Attached storage

• RAID controller inside host machine

• External chassis is simply a JBOD (Just a Bunch Of Disks)

(or what I’d like to call Passive Disk Enclosure or PDE) Eg Dell Powervault MD1000

Page 13: Storage Lecture Notes

13

• Passive Disk Enclosure can consist of SAS, SATA or FC drives

• Passive Disk Enclosure to RAID Controller connectivity can be SAS, FC, SCSI (possibly different from the backplane)

• Multiple PDEs can be daisy chained if they support it

• Array of disks can be divided into multiple RAID groups

• Array of disks can be divided into multiple heterogeneous RAID groups

• Size and type of a RAID group depends on RAID card

• PDE may have multiple paths to system with possibility of multiplexing for increased speed

• Global spares can be defined on the RAID card

• Maximum storage size = maximum number of PDEs that can be daisy chained x size of drives

Page 14: Storage Lecture Notes

14

• Performance Considerations

Drives

RAID configuration

PDE Interconnect

PDE to RAID Card connect

RAID card config (cache etc)

PCI bus

Active Disk Enclosure based Direct Attached Storage (ADE based DAS)

Active Disk Enclosure based DAS:

• ADE Difference -> RAID Card is not in the host machine but in the enclosure

• Host machine has a SAS/FC Host Bus Adaptor (HBA) depending on ADE to Host connectivity support

Some ADEs may support multiple connection protocols

• ADE may support SAS/FC/SATA drives

• ADE can support daisy-chaining PDEs

• Eg of ADE – Dell MD 3000, Infortrend eonstor devices, Nexsan Satabeast and Sataboy etc

Page 15: Storage Lecture Notes

15

Active Disk Enclosure based DAS:

• ADE may support dual RAID Controllers

• RAID Controllers can be used as Active-Active (incase of multiple RAID Groups) – otherwise as Active Passive

• RAID Controller to HBA connectivity can be multiplexed - if supported - for higher throughput

• ADEs are wrongly but commonly referred as SAN (SAN device would still be alright)

Partitioning and Mounting:

Logical Volumes:

• A RAID Group is a physical unit of storage

• At the Operating System a Logical Group can be created out of multiple RAID Groups

• Each Logical Group can be further divided into Logical Volumes

• Each Logical Volume represents a mountable block device

• In Linux this is done using LVM (logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones )

• In LVM Logical Volumes are resizable

Page 16: Storage Lecture Notes

16

SAN (Storage Area Network):

• Multiple host machines connected to an ADE through a SAN switch

• SAN refers to the interconnect + Switch + ADE + PDE

• Switch and HBA can be SAS / FC depending on interconnect type supported by ADE

• ADE would support creation of Volumes

• These can be mounted onto Client and further subdivided

• Care must be taken to mount each Logical Volume onto a single client (unless you are running a Clustered File System)

• This can be achieved by host masking supported by ADE and/or the Switch

• Without careful host masking and mounting data corruption can take place

Page 17: Storage Lecture Notes

17

• Complex SAN configs include multiple hosts and multiple ADEs connected to active-active switches with multiplexed connections

• Client hosts can be of heterogeneous operating systems

• (Funnily ADE to PDE paths sometimes are not be multiplexed)

• While this looks complex – just think of it as removing hard disks from the machine and hosting them outside in separate enclosures

• Each machine mounts an independent partition from the SAN

• Performance Considerations

• All variables we covered before

• Switch config

• Ensure that switch / HBA / interconnect does not become the bottleneck and full hdd throughput can be utilized

Throughput Calculations:

• Hard disk performance – Type, RPM etc

• Data distribution and Type of Data access

• RAID performance, number of drives, RAID type

• RAID card performance – cache, active-active config etc

• ADE to switch connection speed

• Switch to HBA connection speed

• HBA to PCI bus speed

Page 18: Storage Lecture Notes

18

Storage Technologies:

Technology Advantages Limitations Applications

Compact disc, recordable( CD-R ) or rewritable ( CD-RW ) and DVD

Low cost per megabyte

Unlimited capacity with multiple discs

Portable

Widely-supported I/O interfaces

Can be formatted for different data formats

Long life

High data density

Immune to corruption once data is written (CD-R and DVD only)

Limited capacity on one disc(though much greater than diskette)

Slow to moderate read/write speed

Data archiving

Data distribution

Data migration

Localized file sharing

Offsite storage

Technology Advantages Limitations Applications

Page 19: Storage Lecture Notes

19

Diskettes, 1.44 MB

Simple to use

Portable

Can be formatted for different data formats

Limited capacity

Limited read/write speed

Not supported by many newer computers

Local data transfer of small files

Storage of small files or programs

Technology Advantages Limitations Applications

Hard drive, external

High read/write speed

Can be moved among computers

Limited capacity

Awkward for data transfer among multiple computers

Local backup

Local archiving

Technology Advantages Limitations Applications

Page 20: Storage Lecture Notes

20

Hard drive, internal

Convenient; usually comes with the computer

High read/write speed

Convenient for use with single computer (but can be shared among multiple computers with proper support

Most common form of data storage

Limited capacity

Without special support, confined to a single computer or server

Storage in a single computer

Swap files

Technology Advantages Limitations Applications

Removable storage(ZIP disks, JAZ disks, etc.)

Simplicity

Portability

Unlimited capacity with multiple disks

Convenient for use with single computer

Proprietary media

Limited read/write speed

High cost per megabyte

Personal computing

Local data transfer of small files

Local backup

Local archiving

it is the part of an operating system which is responsible for interacting directly with hardware and does this by using your device drivers. think of the kernel as a manager managing and using the processes between the other parts of the operating system and the hardware. it executes tasks to be done and handles errors and access to your computer.

some other functions includes managing directly with your computer memory, allocating resources, "communicating" directly with the cpu and other devices such as your printer and flash drives etc and many more.

it is also responsible for "booting" up your computer, that is after your bios is processed and passes control of computer to your bootloader and from there your kernel given the control and initiates the rest.

the kernel is like the motherboard in your pc which holds and manage everything together.

Technology Advantages Limitations Applications

Page 21: Storage Lecture Notes

21

Solid-state storage(USB devices, flash memory, smart cards, etc.)

No mechanical parts

High read/write speed

Small form factor

Limited storage capacity

High cost per I/O operation

Swap files

Local data transfer

Internet service providers

Video processing

Relational databases

High-speed data acquisition

Technology Advantages Limitations Applications

Direct-attached storage(DAS)

Simplicity

Low initial cost

Ease of management

Storage for each server must be administered separately

Inconvenient for data transfer in network environments

Server bears load of processing applications

Data and application sharing

Data backup

Data archiving

Technology Advantages Limitations Applications

Page 22: Storage Lecture Notes

22

Disk library

High speed

High storage capacity

High data availability

Not as quickly accessible as DAS; intended for "write once, read rarely" data

Disk-to-disk (D2D) backup

Data archiving

Near line storage

Technology Advantages Limitations Applications

Disk-to-disk-to-tape( D2D2T )

Redundancy

High read/write speed

Unlimited capacity with multiple tapes

Complexity

Incremental backups

Storage virtualization

Offsite storage

Data archiving

Technology Advantages Limitations Applications

Fibre Channel(See Storage area network below)

Used to transmit data between devices at gigabit speeds

Frequently used in storage area networks (SANs)

Flexible in terms of distance

High cost

Management complexity

Large databases

Bandwidth-intensive applications

Storage area networks (SANs)

Offsite storage

Mission-critical applications

Technology Advantages Limitations Applications

Page 23: Storage Lecture Notes

23

iSCSI(See Storage area network below)

Used to transmit data between devices using the Internet Protocol (IP)

Frequently used in storage area networks (SANs)

More flexible in terms of distance than Fibre Channel (but not as fast)

May not compare favorably with Fibre Channel for large database transfers

Management complexity

Applications involving remotely distributed databases

Storage area networks (SANs)

Offsite storage

Mission-critical applications

Technology Advantages Limitations Applications

Magnetic tape

Low cost per megabyte

Portability

Unlimited capacity with multiple tapes

Inconvenient for quick recovery of individual files or groups of files

Data archiving

Limited-budget businesses

Offsite storage

Technology Advantages Limitations Applications

Network-attached storage(NAS)

Fast file access for multiple clients

Ease of data sharing

High storage capacity

Redundancy

Ease of drive mirroring

Consolidation of resources

Less convenient than storage area network (SAN) for moving large blocks of data

Data backup

Data archiving

Redundant storage

Technology Advantages Limitations Applications

Page 24: Storage Lecture Notes

24

Redundant array of independent disks(RAID)

High speed

High storage capacity

High data availability

High reliability

Security

Fault tolerance

Users may develop false sense of security

Recovery from failure is difficult in some systems

High cost for optimum systems

Swap files

Internet service providers

Redundant storage

Technology Advantages Limitations Applications

Storage area network(SAN)

Excellent for moving large blocks of data

Exceptional reliability

Wide availability

Fault tolerance

Scalability

High cost

Lack of standardization

Management complexity

Large databases

Bandwidth-intensive applications

Mission-critical applications