52
HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Embed Size (px)

Citation preview

Page 1: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

HARD DISKS AND OTHER STORAGE DEVICES

Jehan-François PârisSpring 2015

Page 2: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Magnetic disks (I)

Sole part of computer architecture with moving parts:

Data stored on circular tracks of a diskSpinning speed between 5,400 and 15,000

rotations per minuteAccessed through a read/write head

Page 3: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Magnetic disks (II)

Platter

R/W headArm

Servo

Page 4: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Magnetic disks (III)

Data are stored into circular tracks Tracks are partitioned into a variable number of

fixed-size sectorsOutside tracks have more sectors than inside

tracks If disk drive has more than one platter, all tracks

corresponding to the same position of the R/W head form a cylinder

Page 5: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Seagate ST4000DM000 (I)

Interface: SATA 6Gb/s (750MB/s) Capacity: 4TB Cache: 64MB multisegmented Seek Average

Read: < 8.5msWrite: <9.5ms

Average data rate: 146 MB/s (R/W) Maximum sustained

data rate: 180MB/s

Page 6: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Seagate ST4000DM000 (II)

Number of platters: 4 Number of heads: 8 Bytes per sector: 4,096 Irrecoverable read

errors per bit read: 1 in 1014

Power consumptionOperating: 7.5W Idle: 5WStandby & Sleep:0.75W

Page 7: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Sectors and blocks

Sectors are the smallest physical storage unit on a diskFixed-sizeTraditionally 512 bytesSeparated by intersector gaps

Blocks are the smallest transfer unit between the disk and the main memory

Page 8: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Magnetic disks (III)

Disk spins at a speed varying between5,400 rpm (laptops) and15,000 rpm (Seagate Cheetah X15, …)Accessing data requires

Positioning the head on the right track: Seek time

Waiting for data to reach the headOn the average half a rotation

Transferring the data

Page 9: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Accessing disk contents

Each block on a disk has a unique addressNormally a single number

Logical block addressing (LBA)Standard since 1996

Older disks used a different scheme Cylinder-head-sector

Exposed disk internal organizationCan still map old CHS triples onto LBA

addresses

Page 10: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Disk access times

Dominated by seek time and rotational delay

We try to reduce seek times by placing all data that are likely to be accessed together on nearby tracks or same cylinder

Cannot do as much for rotational delay

Page 11: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Seek times (I)

Depend on the distance between the two tracks Minimal delay for

Seeks between adjacent tracks Track to track (1-3 ms)

Switching between tracks within the same cylinder

Worse delay for end to end seeks

Page 12: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Seek times (II)

3 to 5x

x

Track to track End to end

Seek time

Page 13: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Rotational latency

On the average half a rotationSame for read and writes

One and half rotations for write/verify

Page 14: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Average rotational delay

RPM Delay

(ms)

5400 5.6

7200 4.2

10,000 3.0

15,000 2.0

Page 15: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Transfer rate (I)

Burst rate:Observed while transferring a blockHighest for blocks on outside tracks

More of them on each track Sustained transfer rate:

Observe red while reading sequential blocksLower

Page 16: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Transfer rate (II)

Actual transfer rate

Page 17: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Double buffering (I)

Speeds up handling of sequential file

B0 B1 B2 B3 B4 B5 B6 …

File

B1Buffers B2

Processedby DBMS

In transfer

Page 18: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Double buffering (II)

When both tasks are completed

B0 B1 B2 B3 B4 B5 B6 …

File

B3Buffers B2

Processedby DBMS

In transfer

Page 19: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

The five minute rule

Jim Gray Keep in memory any data item that will be used

during the next five minutes

Page 20: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

The internal disk controller

Printed circuit board attached to disk driveAs powerful as the CPU of a personal

computer of the early 80's Functions include

Speed bufferingDisk scheduling…

Page 21: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Reliability Issues

Page 22: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Disk failure rates

Failure rates follow a bathtub curveHigh infantile mortality Low failure rate during useful lifeHigher failure rates as disks wear out

Page 23: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Disk failure rates (II)

Failurerate

Time

Infantilemortality

Useful life

Wearout

Page 24: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Disk failure rates (III)

Infant mortality effect can last for months for disk drives

Cheap SATA disk drives seem to age less gracefully than SCSI drives

Page 25: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

The Backblaze study

Reported on the disk failure rates of more than 25,000 disks at Backblaze.

Their disks tend to fail at a rate of5.1 percent per year during their first eighteen

months1.4 percent per year during the next eighteen

months11.8 percent per year after that

Page 26: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

0

5

10

15

0 12 24 36 48

Time (months)

Year

ly fa

ilure

rate

(per

cent

)

Early failure stage5.1% failure rate

Random failure stage1.4% failure rate

Wearout failure stage11.8% failure rate

Page 27: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

MTTF

Disk manufacturers advertise very highMean Times To Fail (MTTF) for their products500,000 to 1,000,000 hours, that is,

57 to 114 years Does not mean that disk will last that long! Means that disks will fail at an average rate of

one failure per 500,000 to 100,000 hours duringtheir useful life

Page 28: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

More MTTF Issues (I)

Manufacturers' claims are not supported by solid experimental evidence

Obtained by submitting disks to a stress test at high temperature and extrapolating results to ideal conditionsProcedure raises many issues

Page 29: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

More MTTF Issues (II)

Failure rates observed in the field aremuch higherCan go up to 8 to 9 percent per year

Corresponding MTTFs are 11 to 12.5 years

If we have 100 disks and a MTTF of 12.5 years, we can expect an average of 8 disk failures per year

Page 30: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Flash Drives

Page 31: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

What about flash? Widely used in flash drives, most MP3

players and some small portable computers

Several important limitationsLimited write bandwidth

Must erase a whole block of data before overwriting any part of it

Limited endurance 10,000 to 100,000 write cycles

Page 32: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Flash drives

Widely used in flash drives, most MP3 players and some small portable computers

Similar technology as EEPROM Three technologies:

NOR flashNAND flashVertical NAND

Page 33: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

NOR Technology

Each cell hasone end connected straight to ground the other end connected straight to a bit line

Longest erase and write times Allow random access to any memory location Good choice for storing BIOS code

Replace older ROM chips

Page 34: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

NAND Technology

Shorter erase and write times Requires less chip area per cell Up to ten times the endurance of NOR flash. Disk-like interface:

Data must be read on a page-wise basis Block erasure:

Erasing older data must be performed one block at a time

Typically 32, 64 or 128 pages

Page 35: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Vertical NAND Technology

Fastest

Page 36: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

The flash drive controller

PerformsError correction

Higher flash densities result in many errorsLoad leveling

Distribute writes among blocks to prevent failures resulting from uneven numbers of erase cycles

Flash drives works best with sequential workloads

Page 37: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Performance data

Widely vary between models: One random pair of specs:

Read Speed 22MBpsWrite Speed 15MBps

Page 38: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID level 0

No replication Advantages:

Simple to implementNo overhead

Disadvantage: If array has n disks failure rate is n times the failure

rate of a single disk

Page 39: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID levels 0 and 1RAID level 0

RAID level 1 Mirrors

Page 40: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID level 1

MirroringTwo copies of each disk block

Advantages:Simple to implementFault-tolerant

Disadvantage:Requires twice the disk capacity of normal file

systems

Page 41: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID level 4 (I)

Requires N+1 disk drivesN drives contain data

Individual blocks, not chunksBlocks with same disk address form a

stripe

x x xx ?

Page 42: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID level 4 (II)

Parity drive contains exclusive or of the N blocks in stripe

p[k] = b[k] b[k+1] ... b[k+N-1]

Parity block now reflects contents of several blocks!

Can now do parallel reads/writes

Page 43: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID levels 4 and 5

RAID level 4

RAID level 5

Bottleneck

Page 44: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

RAID level 5

Single parity drive of RAID level 4 is involved in every write Will limit parallelism

RAID-5 distribute the parity blocks among the N+1 drivesMuch better

Page 45: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

The small write problem

Specific to RAID 5 Happens when we want to update a single block

Block belongs to a stripeHow can we compute the new value of the

parity block

...b[k+1] p[k]b[k+2]b[k]

Page 46: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

First solution

Read values of N-1 other blocks in stripe Recompute

p[k] = b[k] b[k+1] ... b[k+N-1]

Solution requiresN-1 reads2 writes (new block and new parity block)

Page 47: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Second solution

Assume we want to update block b[m] Read old values of b[m] and parity block p[k] Compute

p[k] = new b[m] old b[m] old p[k]

Solution requires2 reads (old values of block and parity block)2 writes (new block and new parity block)

Page 48: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Other RAID organizations (I) RAID 6:

Two check disksTolerates two disk failuresMore complex updates

Page 49: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Other RAID organizations (II) RAID 10:

Also known as RAID 1 + 0Data are striped (as in RAID 0 or RAID 5)

over pairs of mirrored disks (RAID 1)

RAID 0

RAID 1 RAID 1 RAID 1 RAID 1

Page 50: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Other RAID organizations (III) Two dimensional RAIDs

Designed for archival storage Data are written once and read maybe

(WORM) Update rate is less important than

High reliabilityLow storage costs

Page 51: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Complete 2D RAID arrays

Haven parity disksn(n – 1)/2 data disks

P2P1

P3

D13

D14P4

D34

D23

D24

D12

Page 52: HARD DISKS AND OTHER STORAGE DEVICES Jehan-François Pâris Spring 2015

Main advantageWork in progress