CS 470 Lecture 28 - uenics.evansville.eduuenics.evansville.edu/~hwang/s12-courses/cs470/lecture29-disks.pdfFriday, March 23 CS 470 Operating Systems - Lecture 29 6 Disk Drives One

Friday, March 23 CS 470 Operating Systems - Lecture 29 1

Lecture 29

Reminder: Homework 7 is due on Monday at class time for Exam 2 review; no late work accepted.

Reminder: Exam 2 is on Wednesday. Exam 2 review sheet is posted.

Questions?


Outline

Disk systems Disk scheduling Disk management RAID


Disk Drives

A disk is viewed logically as a linear array of blocks. How is it mapped onto a circular disk drive?

A disk drive is one or more platters rotating on a spindle. Each side of a platter has a head that reads the data off that side of the platter. Each platter side has concentric grooves called tracks. The vertical extent of the same track position on each platter is a cylinder. Each track/cylinder is divided into sectors.


Disk Drives


Disk Drives

Generally, block numbers are mapped with Block 0 at cylinder/track 0 (outermost groove), head 0, sector 0. The next block is sector 1 until the track is full, then the next block is head 1, sector 0, etc., until the cylinder is full, then the next block is cylinder/track 1, head, 0, sector 0, and so forth.

Conceptually, it is possible for OS's to map logical block numbers to <cyl, head, sector> addresses, but this does not happen any more with mapping handled by the disk controller.


Disk Drives

One reason mapping is done in disk controller is that disks have been getting larger. Density has increased in three dimensions. # sectors/track (higher rotation speed) # tracks/platter (shorter seek separation) # bits/space (vertical writes in groove)

Components of disk performance are seek time: disk arm movement to correct cylinder rotational delay (latency): wait for correct sector to

rotate under the head


Disk Drives

Taken together, data access time is determined by Bandwidth (bytes transferred/unit time): buffer to

disk, buffer to host Buffer size

Disk drives come in various speeds and sizes optimized for various applications


Disk Drives

Disk drive Application Sizes RPM Cache Buffer to host

Notes, Street price

WD Caviar Blue Standard, internal desktop

80GB-1TB

7200 32MB SATA 6Gb/s

1TB ~$105

WD Caviar Black Maximum speed, internal desktop

500GB-2TB


2TB ~$210

WD Caviar Green Maximum capacity, low power, internal desktop

320GB-3TB

variable 64MB SATA 3Gb/s

3TB ~$200

WD VelociRaptor Internal, enterprise server

150-600GB


600GB ~$270

WD Scorpio Blue Standard, internal laptop

80GB-1TB

5200 8MB SATA 3Gb/s

1TB ~$135

WD Scorpio Black Maximum power, internal laptop

160GB-750GB


750GB ~$165

Using Western Digital as a prototypical line


Disk Drives

Disk drive Application Sizes RPM Cache Buffer to host

Notes, Street price

WD AV-25 24/7 surveillance 160-500GB


MTBF 1 million hours, 500GB ~$90

WD My Book Essential

External desktop 1-3TB USB 3.0 5Gb/s

3TB ~$170

WD My PassportEssential

External portable 500GB-2TB

USB 3.0 5Gb/s

1TB ~$130

WD My Book Live Duo

NetworkedPersonal Cloud

Storage

4-6TB Ethernet RAID 1/0 (2 drives in box), 6TB ~$480

Toshiba makes a 240GB, 4200 RPM, 8MB cache disk drive. Why would anyone want to buy this small, slow drive?


Disk Drives

What is the limit on the capacity of a disk drive using conventional magnetic media? Typical drives are ~250Gb/sq.in. The Toshiba drive is ~344Gb/sq.in. Current limit is ~500Gb/sq.in. Theoretical limit is ~1Tb/sq.in., any smaller grains

and heat will change the magnetization of the bits

Seagate research into ways of packing more bits. Theoretically up to 50Tb/sq.in.


Disk Scheduling

As with all resources, can extract best perfor-mance if schedule disk accesses. Now mostly done in the disk controller because: Original IDE interface has maximum 16383

cylinders x 16 heads x 63 sectors = 8.4GB to be reported. All disks do this now and the EIDE interface was added to find the actual geometry using LBA (linear block addressing).

Most disks map out defective sectors to spare ones. # sectors/track is not constant. About 40% more

sectors on outer tracks than on inner tracks.


Disk Scheduling

OS generally just makes requests to the controller. The controller has a queue and a scheduling algorithm to choose which request is serviced next.

The algorithms are straightforward and have similar properties to other scheduling algorithms that we have studied.

OS's are now more concerned with disk management. I.e., how to make a disk usable to users.


Formatting

Low-level, physical formatting is done at the factory, but OS can do this, too.

File system formatting Create a partition table that groups cylinders into a

virtual disk. Tools like fdisk, sfdisk, PartitionMagic

Create file system. In Unix, makefs allocates inodes (index blocks).

Create swap space.


Boot Block

How does a computer find the OS to boot? Cannot require that it be in a particular location on a particular disk, since we can choose between more than one.

Bootstrap loader is a program that loads OS's. It could be stored in ROM, but then would be hard to change. Usually very small loader is stored in ROM that knows where the loader program is in the boot block (aka MBR - master boot record). Example loaders include grub, lilo, the Windows loader,...


Boot Block

Boot loaders know how to initialize the CPU and bring up the file system. They are configured to know where the OS program code resides. E.g., grub knows they are in the file system, usually in /boot.

Boot loader loads the kernel into memory, then jumps to the first instruction of the OS. Then the OS takes over.


Bad Blocks

All disks have bad areas. The factory initially maps out the blocks that would be been allocated to these areas. (Too many of them causes the disk to be rejected.)

Some disk controllers are "smart" (e.g., SCSI) and automatically remap bad blocks when encountered. Spare sectors are reserved on each cylinder for this.

Other controllers rely on OS to inform. E.g. Win marks FAT entries after chkdisk scan.


Swap Space

Usage of swap space depends on memory management algorithm and OS.

Some store entire program and data in swap space for duration of execution. Others only store the pages being used.


Swap Space

Swap space issues include file vs. disk partition - usually a raw partition with

dedicated manager for speed single vs. multiple spaces location - if single, usually in center of disk; multiple

only if multiple disks size - running out means aborting processes, but

more real memory means less need to swap


RAID

Disks have gotten physically smaller and much cheaper. Want to combine multiple disks into one system to increase read/write performance and to improve reliability.

Initially, RAID was Redundant Arrays of Inexpensive Disks focusing on providing large amounts of storage cheaply. Now focus is on reliability, so now RAID is Redundant Arrays of Independent Disks.


RAID

Reliability is characterized by mean time to failure (MTF). E.g., 100,000 hours for a disk.

For an array of 100 disks, the MFT that some disk will fail is 100000/100 = 1000 hours = 41.66 days(!). If only one copy of each piece of data is stored, each failure is costly.

To solve this problem, introduce redundancy. I.e., store extra information that can be used to rebuild lost information.


RAID

Simplest redundancy is to mirror a disk. I.e., create a duplicate. Every write goes to both disks and a read can go to either one. The only way to lose data is if the second disk fails during the time to repair the first disk.

MTF for the system depends on the MTF of the disks and the mean time to repair (MTR).


RAID

If disk failures are independent and MTR is 10 hours, MTF (i.e., data loss) is

1000002/(2*10) hours

= 500x106 hours

= ~57,000 years(!)

Of course, many failures are not independent. E.g., power failures, natural disasters, manufacturing defects, etc.


RAID

Performance is increased through parallelism. E.g., for a mirrored disk, transfer rate is the same as a single disk, but overall read rate doubles.

Transfer rate can be improved by striping data across multiple disks. E.g., if we have 8 disks, can write one bit of each byte on each disk simultaneously. Number of accesses per unit time is the same, but each accesses reads 8 times as much data. Larger units such as block striping is common.


RAID Levels

Striping does not help with reliability, and mirroring is expensive. Various schemes, called RAID levels, provide both with different tradeoffs.

RAID 0 is simple striping. RAID 1 is simple mirroring. Higher levels are more complicated.


RAID 0+1 and RAID 1+0

Can also combine schemes.

RAID 0+1 is a mirrored RAID 0 system.

RAID 1+0 is a RAID 1 system that is striped.

Documents

CS 470 Lecture 28 - uenics.evansville.eduuenics.evansville.edu/~hwang/s12-courses/cs470/lecture29-disks.pdfFriday, March 23 CS 470 Operating Systems - Lecture 29 6 Disk Drives One