Chapter 7: IO CS140 Computer Organization These slides are derived from those of Null & Lobur + my notes while using previous texts

Chapter 7: IO

CS140 Computer Organization

These slides are derived from those of Null & Lobur + my notes while using previous texts.

Chapter 7 Objectives

• Understand how I/O systems work, including I/O methods and architectures.

• Become familiar with storage media, and the differences in their respective formats.

• Understand how RAID improves disk performance and reliability, and which RAID systems are most useful today.

• Be familiar with emerging data storage technologies and the barriers that remain to be overcome.

7.1 Introduction

• Data storage and retrieval is one of the primary functions of computer systems.

– One could easily make the argument that computers are more useful to us as data storage and retrieval devices than they are as computational machines.

• All computers have I/O devices connected to them, and to achieve good performance I/O should be kept to a minimum!

• In studying I/O, we seek to understand the different types of I/O devices as well as how they work.

7.2 I/O and Performance

7.3 Amdahl’s Law• The overall performance of a system is a result of

the interaction of all of its components.

• System performance is most effectively improved when the performance of the most heavily used components is improved.

• This idea is quantified by Amdahl’s Law:

where S is the overall speedup; f is the fraction of work performed by a faster component; and k is the speedup of the faster component.

• The overall performance of a system is a result of the interaction of all of its components.

7.3 Amdahl’s LawEXAMPLE:

• On a large system, suppose we can upgrade a CPU to make it 50% faster for $10,000 or upgrade its disk drives for $7,000 to make them 250% faster.

• Processes spend 70% of their time running in the CPU and 30% of their time waiting for disk service.

• An upgrade of which component would offer the greater benefit for the lesser cost?

7.3 Amdahl’s Law

• The processor option offers a 130% speedup:

• And the disk drive option gives a 122% speedup:

• Each 1% of improvement for the processor costs $333, and for the disk a 1% improvement costs $318.

Should price/performance be your only concern?

EXAMPLE:• On a large system, suppose we can upgrade a CPU to make it 50% faster

for $10,000 or upgrade its disk drives for $7,000 to make them 250% faster.

• Processes spend 70% of their time running in the CPU and 30% of their time waiting for disk service.

• An upgrade of which component would offer the greater benefit for the lesser cost?

7.4 I/O Architectures

• We define input/output as a subsystem of components that moves coded data between external devices and a host system.

• I/O subsystems include:– Portions of main memory that are devoted to I/O

functions.– Buses that move data into and out of the system. – Software modules (drivers) in the host and in

peripheral devices– Hardware Interfaces (controllers) to external

components such as keyboards and disks.– Cabling or communications links between the host

system and its peripherals.


This is a

model I/O

configuration.

7.4 I/O Architectures - BusesGoal: Place data from

the disk into memory:

Steps in a bus operation:1. Processor arbitrates for

and sends request to the PCI bus.

2. PCI passes request to SCSI controller.

3. SCSI controller arbitrates for SCSI bus and then passes request to disk.

4. Disk arbitrates for SCSI bus and then sends data back to SCSI controller.

5. The SCSI controller arbitrates for control of the PCI bus and then makes request of bridge controller (the chip set).

6. SCSI controller sends data to memory.

Small Computer System Interface, or SCSI (pronounced scuzzy[1]), is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, and electrical and optical interfaces. SCSI is most commonly used for hard disks and tape drives, but it can connect a wide range of other devices, including scanners and CD drives.

http://en.wikipedia.org/wiki/SCSI#cite_note-0

http://en.wikipedia.org/wiki/Peripheral_device

http://en.wikipedia.org/wiki/SCSI_command

http://en.wikipedia.org/wiki/Interface_%28computer_science%29

http://en.wikipedia.org/wiki/CD-ROM

http://en.wikipedia.org/wiki/Optical_drive

7.4 I/O Architectures - BusesBus Characteristics:Bandwidth,Response time,Length,Physical Standardized?

Bus Type Firewire (1394) USB 2.0

Bus Type I/O I/O

Basic data bus width (signals) 4 2

Clocking Asynchronous Asynchronous

Theoretical Peak Bandwidth 50 – 100 MB/sec 0.2 – 60 MB/sec

Hot Plugable Yes Yes

Maximum number of devices 63 127

Maximum bus length (Copper) 4.5 meters 5 meters

Name of standard IEEE 1394 USE Implementors

7.4 I/O Architectures - BusesBus Characteristics:Bandwidth,Response time,Length,Physical Standardized?

7.4 I/O Architectures - Interfacing

The Operating System has the job of talking to the hardware – together they take responsibility for:

1. Allowing multiple programs using the processor to share the IO,

2. Working with interrupts as a means of communicating between the Processor and the IO,

3. Mechanisms that allow the processor to request IO operations.

7.4 I/O Architectures - InterfacingThe Processor communicates with IO devices in one of two

ways:Memory mapped IO 1. A portion of the virtual memory address space is set aside

and NO physical memory is associated with it by the OS.2. Then the processor can read/write to these addresses and

those actions are “magically” translated into requests to the IO devices.

IO Instructions1. There are unique instructions that are used to read and

write to a device.2. Possible example “Write to Device Location” might look like

this: movwf PORTC

I/O can be controlled in four general ways.

• Programmed I/O reserves a register for each I/O device. Each register is continually polled to detect data arrival.

• Interrupt-Driven I/O allows the CPU to do other things until I/O is requested.

• Direct Memory Access (DMA) offloads I/O processing to a special-purpose chip that takes care of the details.

• Channel I/O uses dedicated I/O processors.


This is an idealized I/O subsystem that uses interrupts. Each device connects its interrupt line to the interrupt controller.


The controller signals the CPU when any of the interrupt lines are asserted.

7.4 I/O ArchitecturesThe IO device communicates with the Processor using interrupts:

1. The processor goes about its business.2. When the device wants something, it follows these steps.

a) Write the result of the action to an agreed upon location in main memory.

b) Send an electrical signal to the processor.c) The processor goes to a set-aside piece of code called an interrupt

handler. This handler is part of the OS.d) The interrupt handler looks at the memory data written in a).e) May start a new IO request.

Interrupt Priorities:But what if a new interrupt happens while an interrupt is being handled?Requires levels of interrupt and masking.

• In a system that uses interrupts, the status of the interrupt signal is checked at the top of the fetch-decode-execute cycle.

• The particular code that is executed whenever an interrupt occurs is determined by a set of addresses called interrupt vectors that are stored in low memory. PIC had only one at address = 4; bigger machines have many more.

• The system state is saved before the interrupt service routine is executed and is restored afterward.


This is a DMA configuration.

Notice that the DMA and the CPU share the bus.

The DMA runs at a higher priority and steals memory cycles from the CPU.


7.4 I/O Architectures - InterfacingDMA:This is a method for the device to move large amounts of data

into main memory without interference/interaction by the processor.

Here’s a possible sequence of steps:1. Processor sets aside memory where transfer will go to.2. Processor sets up the IO device – tells it, for a disk, the

following information.a) What block(s) of the disk should be obtained.b) What location in memory should this data should go to –

the set-aside memory.c) Oh – and interrupt me when you’re done.

3. This requires a chip-set to be able to do the switching.

• Very large systems employ channel I/O.• Channel I/O consists of one or more I/O

processors (IOPs) that control various channel paths.

• Slower devices such as terminals and printers are combined (multiplexed) into a single faster channel.

• On IBM mainframes, multiplexed channels are called multiplexor channels, the faster ones are called selector channels.


• Channel I/O is distinguished from DMA by the intelligence of the IOPs.

• The IOP negotiates protocols, issues device commands, translates storage coding to memory coding, and can transfer entire files or groups of files independent of the host CPU.

• Especially, the IOP can offload much of the interrupt handling from the main processor.

• The host has only to create the program instructions for the I/O operation and tell the IOP where to find them.

• Increasingly intelligence is cheap – devices are becoming more and more complex.


• This is a channel I/O configuration.


• Character I/O devices process one byte (or character) at a time.

– Examples include modems, keyboards, and mice.

– Keyboards are usually connected through an interrupt-driven I/O system.

• Block I/O devices handle bytes in groups.

– Most mass storage devices (disk and tape) are block I/O devices.

– Block I/O systems are most efficiently connected through DMA or channel I/O.


• I/O buses, unlike memory buses, operate asynchronously. Requests for bus access must be arbitrated among the devices involved.

• Bus control lines activate the devices when they are needed, raise signals when errors have occurred, and reset devices when necessary.

• The number of data lines is the width of the bus.

• A bus clock coordinates activities and provides bit cell boundaries.


This is a generic DMA configuration showing how the DMA circuit connects to a data bus.


This is how a bus connects to a disk drive.


Timing diagrams, such as this one, define bus operation in detail.


7.5 Data Transmission Modes

• Bytes can be conveyed from one point to another by sending their encoding signals simultaneously using parallel data transmission or by sending them one bit at a time in serial data transmission.

– Parallel data transmission for a printer resembles the signal protocol of a memory bus:

7.5 Data Transmission Modes

• In parallel data transmission, the interface requires one conductor for each bit.

• Parallel cables are fatter than serial cables.

• Compared with parallel data interfaces, serial communications interfaces:– Require fewer conductors.– Are less susceptible to attenuation.– Can transmit data farther and faster.

Serial communications interfaces are suitable for time-sensitive (isochronous) data such as voice and video.

• Magnetic disks offer large amounts of durable storage that can be accessed quickly.

• Disk drives are called random (or direct) access storage devices, because blocks of data can be accessed according to their location on the disk.

– This term was coined when all other durable storage (e.g., tape) was sequential.

• Magnetic disk organization is shown on the following slide.

7.6 Magnetic Disk Technology

• Hard disk platters are mounted on spindles.

• Read/write heads are mounted on a comb that swings radially to read the disk.

• The rotating disk forms a logical cylinder beneath the read/write heads.

• Data blocks are addressed by their cylinder, surface, and sector.


• There are a number of electromechanical properties of hard disk drives that determine how fast its data can be accessed.

• Seek time is the time that it takes for a disk arm to move into position over the desired cylinder.

• Rotational delay is the time that it takes for the desired sector to move into position beneath the read/write head.

• Seek time + rotational delay = access time.


7.6 Disk Storage

Characteristics of three magnetic disks:

Characteristics Seagate ST373453

Seagate ST3200822

Seagate ST94811A

Disk Diameter (inches) 3.50 3.50 2.5Formatted disk capacity (GB) 73.4 200 40.0Number of disk surfaces (heads) 8 4 2Rotation Speed (RPM) 15,000 7200 5400Internal disk cache size (MB) 8 8 8External interface bandwidth (MB/sec) SCSII, 320 Serial ATA, 150 ATA 100

Sustained transfer rate (MB/sec) 57 – 86 32 – 58 34Minimum seek read/write (ms) 0.2/0.4 1.0/1.2 1.5/2.0Average seek read/write (ms) 3.6/3.9 8.5/9.5 12/14Mean time to failure – MTTF (hrs) 1,2000,000 600,000 330,000Warranty (years) 5 3 --Unrecoverable read errors per bits read < 1 in 1015 < 1 in 1014 < 1 in 1014

Size: dimensions(in), weights(lbs) 1” x 4” x 5.8” 1.9lb 1” x 4” x 5.8” 1.4lb 0.4” x2.7” x3.9” 0.2lb

Power: operating/idle/standby (watts) 20/12/-- 12/8/1 2.4/1.0/0.4

GB/ cu. In.; GB/watt 3 GB/in3; 4 GB/watt

9 GB/in3; 16 GB/watt

3 GB/in3; 4 GB/watt

Price in 2004 $/GB $400; $5/GB $100; $0.5/GB $100; $2.50/GB

7.6 Disk Storage

Characteristics of a magnetic disk - 2008

Specifications 500 GB

Model Number ST3500830ACE

Interface Ultra ATA/100

Cache 8 MBytes

Capacity 500 GB

Guaranteed Sectors 976,773,168

PERFORMANCE

Spindle Speed 7200 rpm

Sustained data transfer rate 72 Mbytes/sec.

Average latency 4.16 msec

Random read seek time <11.0 msec

Random write seek time <12.0 msec

Maximum interface transfer rate 100 Mbytes/sec.

RELIABILITY

Annual Failure Rate 0.68 %

Unrecoverable read errors 1 in 10^14

POWER

Average idle current 9.30 watts

Average seek power 8.20 watts

Standby power 0.80 watts

Sleep power 0.807 watts

7.6 Disk Storage

Characteristics of another magnetic disk - 2008

Specifications 750 GB

Model Number ST3750640AS-RK

Interface SATA

Cache 16 MBytes

Capacity 750 GB

PERFORMANCE

Spindle Speed 7200 rpm

ST3750640AS-RK - $200– Seagate Internal 3.5-inch, 750-GB Serial ATA (SATA) Hard Drives offer the best combination of capacity, performance,

reliability and value for multiple applications. The internal hard drive features a 3-Gb/s data transfer rate, Native Command Queuing (NCQ) and the most advanced fluid dynamic bearing motor for whisper-quiet operation. SeaTools diagnostic software continuously monitors data safety and drive performance. The Internal 3.5-inch SATA Hard Drive bundle also includes easy-to-use DiscWizard software for simple installation.

– Key Features and Benefits– SATA 3 Gb/s and NCQ interface for greater throughput and reliability – No jumper settings and thinner cables – 16-MB cache buffer – DiscWizard software for effortless installation – 7200-RPM spin speed – Fast performance – Superb reliability – Fluid Dynamic Bearing motor for whisper-quiet acoustics – Built-in self-monitoring technology – Exceptional value – 5-year warranty

• On a standard 1.44MB floppy, the FAT (File Access Table) is limited to nine 512-byte sectors. – There are two copies of the FAT.

• There are 18 sectors per track and 80 tracks on each surface of a floppy, for a total of 2880 sectors on the disk. So each FAT entry needs at least 12 bits (211= 2048 < 2880 < 212 = 4096).– Thus, FAT entries for disks smaller than 10MB are 12

bits, and the organization is called FAT12.– FAT 16 is employed for disks larger than 10MB.

7.6 Magnetic Disk TechnologyFloppies and FAT

• The disk directory associates logical file names with physical disk locations.

• Directories contain a file name and the file’s first FAT entry.

• If the file spans more than one sector (or cluster), the FAT contains a pointer to the next cluster (and FAT entry) for the file.

• The FAT is read like a linked list until the <EOF> entry is found.

7.6 Magnetic Disk TechnologyFloppies and FAT

• A directory entry says that a file we want to read starts at sector 121 in the FAT fragment shown below.

– Sectors 121, 124, 126, and 122 are read. After each sector is read, its FAT entry is to find the next sector occupied by the file.

– At the FAT entry for sector 122, we find the end-of-file marker <EOF>.


How many disk accesses are required to read this file?

Floppies and FAT

• Optical disks provide large storage capacities very inexpensively.

• They come in a number of varieties including CD-ROM, DVD.

• It is estimated that optical disks can endure for a hundred years. Other media are good for only a decade-- at best.

• CD-ROMs were designed by the music industry in the 1980s, and later adapted to data

• This history is reflected by the fact that data is recorded in a single spiral track, starting from the center of the disk and spanning outward.

7.7 Optical Disks

• Binary ones and zeros are delineated by bumps in the polycarbonate disk substrate. The transitions between pits and lands define binary ones.

• If you could unravel a full CD-ROM track, it would be nearly five miles long!

• The logical data format for a CD-ROM is much more complex than that of a magnetic disk. (See the text for details.)

• Different formats are provided for data and music.

• Two levels of error correction are provided for the data format.

• Because of this, a CD holds at most 650MB of data, but can contain as much as 742MB of music

7.7 Optical Disks

• DVDs can be thought of as quad-density CDs.– Varieties include single sided, single layer, single

sided double layer, double sided double layer, and double sided double layer.

• Where a CD-ROM can hold at most 650MB of data, DVDs can hold as much as 17GB.

• One of the reasons for this is that DVD employs a laser that has a shorter wavelength than the CD’s laser.

• This allows pits and land to be closer together and the spiral track to be wound tighter.

7.7 Optical Disks

• Blue-violet laser disks have been designed for use in the data center.

• The intention is to provide a means for long term data storage and retrieval.

• Two types are now dominant:– Sony’s Professional Disk for Data (PDD) that can

store 23GB on one disk and– Plasmon’s Ultra Density Optical (UDO) that can hold

up to 30GB.

• It is too soon to tell which of these technologies will emerge as the winner.

7.7 Optical Disks

• First-generation magnetic tape was not much more than wide analog recording tape, having capacities under 11MB.

• Data was usually written in nine vertical tracks:

7.8 Magnetic Tape

• Today’s tapes are digital, and provide multiple gigabytes of data storage.

• Two dominant recording methods are serpentine and helical scan, which are distinguished by how the read-write head passes over the recording medium.

• Serpentine recording is used in digital linear tape (DLT) and Quarter inch cartridge (QIC) tape systems.

• Digital audio tape (DAT) systems employ helical scan recording.

7.8 Magnetic Tape

These two recording methods are shown on the next slide.

7.8 Magnetic Tape

Serpentine

Helical Scan

• Numerous incompatible tape formats emerged over the years.

– Sometimes even different models of the same manufacturer’s tape drives were incompatible!

• Finally, in 1997, HP, IBM, and Seagate collaboratively invented a best-of-breed tape standard.

• They called this new tape format Linear Tape Open (LTO) because the specification is openly available.

7.8 Magnetic Tape

• LTO, as the name implies, is a linear digital tape format.

• The specification allowed for the refinement of the technology through four “generations.”

• Generation 3 was released in 2004.

– Without compression, the tapes support a transfer rate of 80MB per second and each tape can hold up to 400GB.

• LTO supports several levels of error correction, providing superb reliability.

– Tape has a reputation for being an error-prone medium.

7.8 Magnetic Tape

• RAID, an acronym for Redundant Array of Independent Disks was invented to address problems of disk reliability, cost, and performance.

• In RAID, data is stored across many disks, with extra disks added to the array to provide error correction (redundancy).

• The inventors of RAID, David Patterson, Garth Gibson, and Randy Katz, provided a RAID taxonomy that has persisted for a quarter of a century, despite many efforts to redefine it.

7.9 RAID

• RAID Level 0, also known as drive spanning, provides improved performance, but no redundancy.– Data is written in blocks across the entire array

– The disadvantage of RAID 0 is in its low reliability.

7.9 RAID

• RAID Level 1, also known as disk mirroring, provides 100% redundancy, and good performance.– Two matched sets of disks contain the same data.

– The disadvantage of RAID 1 is cost.

7.9 RAID

• A RAID Level 2 configuration consists of a set of data drives, and a set of Hamming code drives.– Hamming code drives provide error correction for the data

drives.

– RAID 2 performance is poor and the cost is relatively high.

7.9 RAID

• RAID Level 3 stripes bits across a set of data drives and provides a separate disk for parity.– Parity is the XOR of the data bits.

– RAID 3 is not suitable for commercial applications, but is good for personal systems.

7.9 RAID

• RAID Level 4 is like adding parity disks to RAID 0.– Data is written in blocks across the data disks, and a parity

block is written to the redundant drive.

– RAID 4 would be feasible if all record blocks were the same size.

7.9 RAID

• RAID Level 5 is RAID 4 with distributed parity.– With distributed parity, some accesses can be serviced

concurrently, giving good performance and high reliability.

– RAID 5 is used in many commercial systems.

7.9 RAID

• RAID Level 6 carries two levels of error protection over striped data: Reed-Soloman and parity.– It can tolerate the loss of two disks.

– RAID 6 is write-intensive, but highly fault-tolerant.

7.9 RAID

• Double parity RAID (RAID DP) employs pairs of over- lapping parity blocks that provide linearly independent parity functions.

7.9 RAID

• Like RAID 6, RAID DP can tolerate the loss of two disks.

• The use of simple parity functions provides RAID DP with better performance than RAID 6.

• Of course, because two parity functions are involved, RAID DP’s performance is somewhat degraded from that of RAID 5.

– RAID DP is also known as EVENODD, diagonal parity RAID, RAID 5DP, advanced data guarding RAID (RAID ADG) and-- erroneously-- RAID 6.

7.9 RAID

• Large systems consisting of many drive arrays may employ various RAID levels, depending on the criticality of the data on the drives.– A disk array that provides program workspace (say

for file sorting) does not require high fault tolerance.

• Critical, high-throughput files can benefit from combining RAID 0 with RAID 1, called RAID 10.

• Keep in mind that a higher RAID level does not necessarily mean a “better” RAID level. It all depends upon the needs of the applications that use the disks.

7.9 RAID

• Advances in technology have defied all efforts to define the ultimate upper limit for magnetic disk storage.– In the 1970s, the upper limit was thought to be

around 2Mb/in2.– Today’s disks commonly support 20Gb/in2.

• Improvements have occurred in several different technologies including:– Materials science– Magneto-optical recording heads.– Error correcting codes.

7.10 The Future of Data Storage

• As data densities increase, bit cells consist of proportionately fewer magnetic grains.

• There is a point at which there are too few grains to hold a value, and a 1 might spontaneously change to a 0, or vice versa.

• This point is called the superparamagnetic limit.– In 2006, the superparamagnetic limit is thought to lie

between 150Gb/in2 and 200Gb/in2 .

• Even if this limit is wrong by a few orders of magnitude, the greatest gains in magnetic storage have probably already been realized.


• Future exponential gains in data storage most likely will occur through the use of totally new technologies.

• Research into finding suitable replacements for magnetic disks is taking place on several fronts.

• Some of the more interesting technologies include:

– Biological materials

– Holographic systems and

– Micro-electro-mechanical devices.


• Present day biological data storage systems combine organic compounds such as proteins or oils with inorganic (magentizable) substances.

• Early prototypes have encouraged the expectation that densities of 1Tb/in2 are attainable.

• Of course, the ultimate biological data storage medium is DNA.– Trillions of messages can be stored in a tiny strand of

DNA.

• Practical DNA-based data storage is most likely decades away.


• Holographic storage uses a pair of laser beams to etch a three-dimensional hologram onto a polymer medium.


• Data is retrieved by passing the reference beam through the hologram, thereby reproducing the original coded object beam.


• Because holograms are three-dimensional, tremendous data densities are possible.

• Experimental systems have achieved over 30Gb/in2, with transfer rates of around 1GBps.

• In addition, holographic storage is content addressable.– This means that there is no need for a file directory on

the disk. Accordingly, access time is reduced.

• The major challenge is in finding an inexpensive, stable, rewriteable holographic medium.


• Micro-electro-mechanical storage (MEMS) devices offer another promising approach to mass storage.

• IBM’s Millipede is one such device.

• Prototypes have achieved densities of 100Gb/in2 with 1Tb/in2 expected as the technology is refined.


A photomicrograph of Millipede is shown on the next slide.

• Millipede consists of thousands of cantilevers that record a binary 1 by pressing a heated tip into a polymer substrate.


• The tip reads a binary 1 when it dips into the imprint in the polymer

Photomicrograph courtesy of the IBM Corporation.© 2005 IBM Corporation

11.4 Benchmarking

• Performance benchmarking is the science of making objective assessments concerning the performance of one system over another.

• Price-performance ratios can be derived from standard benchmarks.

• The troublesome issue is that there is no definitive benchmark that can tell you which system will run your applications the fastest (using the least wall clock time) for the least amount of money.

11.4 Benchmarking• Many people erroneously equate CPU speed with

performance.• Measures of CPU speed include cycle time (MHz, and

GHz) and millions of instructions per second (MIPS).• Saying that System A is faster than System B because

System A runs at 2.4GHz and System B runs at 1.4GHz is valid only when the ISAs of Systems A and B are identical.– With different ISAs, it is possible that both of these

systems could obtain identical results within the same amount of wall clock time.

– Even the same external architectures can mask huge differences in the implementation of those architectures – example Pentium 3 and Pentium 4.

11.4 Benchmarking• In an effort to describe performance independent of

clock speed and ISAs, a number of synthetic benchmarks have been attempted over the years.

• Synthetic benchmarks are programs that serve no purpose except to produce performance numbers.

• The earliest synthetic benchmarks, Whetstone, Dhrystone (to name only a few) were relatively small programs that were easy to optimize – the compiler could outsmart the program– This fact limited their usefulness from the outset.

• These programs are much too small to be useful in evaluating the performance of today’s systems.

11.4 Benchmarking

• In 1988 the Standard Performance Evaluation Corporation (SPEC) was formed to address the need for objective benchmarks.

• SPEC produces benchmark suites for various classes of computers and computer applications.

• Their most widely known benchmark suite is the SPEC CPU benchmark.

• The SPEC CPU2006 benchmark consists of two parts, CINT2006, which measures integer arithmetic operations, and CFP2006, which measures floating-point processing.

11.4 Benchmarking

• The SPEC benchmarks consist of a collection of kernel programs.

• These are programs that carry out the core processes involved in solving a particular problem. – Activities that do not contribute to solving the

problem, such as I/O are removed.

• CINT2006 consists of 12 applications (11 written in C and one in C++); CFP2006 consists of 14 applications (6 FORTRAN 77, 4 FORTRAN 90, and 4 C).

A list of these programs can be found in Table 10.7 on Pages 467 - 468.

11.4 Benchmarking

• On most systems, more than two 24 hour days are required to run the SPEC benchmark suite.

• Upon completion, the execution time for each kernel (as reported by the benchmark suite) is divided by the run time for the same kernel on a Sun Ultra 10.

• The final result is the geometric mean of all of the run times.

• Manufacturers may report two sets of numbers: The peak and base numbers are the results with and without compiler optimization flags, respectively.

11.4 Benchmarking

• The SPEC CPU benchmark evaluates only CPU performance.

• When the performance of the entire system under high transaction loads is a greater concern, the Transaction Performance Council (TPC) benchmarks are more suitable.

• The current version of this suite is the TPC-C benchmark.

• TPC-C models the transactions typical of a warehousing and distribution business using terminal emulation software.

11.4 Benchmarking

• The TPC-C metric is the number of new warehouse order transactions per minute (tpmC), while a mix of other transactions is concurrently running on the system.

• The tpmC result is divided by the total cost of the configuration tested to give a price-performance ratio.

• The price of the system includes all hardware, software, and maintenance fees that the customer would expect to pay.

11.4 Benchmarking

• The Transaction Performance Council has also devised benchmarks for decision support systems (used for applications such as data mining) and for Web-based e-commerce systems.

• For all of the TPC benchmarks, the systems tested must be available for general sale at the time of the test and at the prices cited in a full disclosure report.

• Results of the tests are audited by an independent auditing firm that has been certified by the TPC.

11.6 Disk Performance

• Optimal disk performance is critical to system throughput.

• Disk drives are the slowest memory component, with the fastest access times one million times longer than main memory access times.

• A slow disk system can choke transaction processing and drag down the performance of all programs when virtual memory paging is involved.

• Low CPU utilization can actually indicate a problem in the I/O subsystem, because the CPU spends more time waiting than running.


• Disk utilization is the measure of the percentage of the time that the disk is busy servicing I/O requests.

• It gives the probability that the disk will be busy when another I/O request arrives in the disk service queue.

• Disk utilization is determined by the speed of the disk and the rate at which requests arrive in the service queue. Stated mathematically:

Utilization = Request Arrival Rate Disk Service Rate.

where the arrival rate is given in requests per second, and the disk service rate is given in I/O operations per second (IOPS)


• The amount of time that a request spends in the queue is directly related to the service time and the probability that the disk is busy, and it is indirectly related to the probability that the disk is idle.

• In formula form, we have:Time in Queue = (Service time Utilization)

(1 – Utilization)

• The important relationship between queue time and utilization (from the formula above) is shown graphically on the next slide.


The “knee” of the curve is around 78%. This is why 80% is the rule-of-thumb upper limit for utilization for most disk drives.Beyond that, queue time quickly becomes excessive.

• How a seek works - what's the average seek time?

• What is locality of reference and how does it affect seek time?

• What is Rotational Latency?

What is the total disk access time? Here’s a sample calculation:

• Is a read time the same as a write time?

• Talk about caches!


11.6 Disk Performance• The manner in which files are organized on a disk greatly affects

throughput.

• Disk arm motion is the greatest consumer of service time.

• Disk specifications cite average seek time, which is usually in the range of 5 to 10ms.

• However, a full-stroke seek can take as long as 15 to 20ms.

• Clever disk scheduling algorithms endeavor to minimize seek time.

• The most naïve disk scheduling policy is first-come, first-served (FCFS).

• As its name implies, FCFS services all I/O requests in the order in which they arrive in the queue.

• With this approach, there is no real control over arm motion, so random, wide sweeps across the disk are possible.


Using FCFS, performance is unpredictable and widely variable.


• Arm motion is reduced when requests are ordered so that the disk arm moves only to the track nearest its current location.

• This is the idea employed by the shortest seek time first (SSTF) scheduling algorithm.

• Disk track requests are queued and selected so that the minimum arm motion is involved in servicing the request.

The next slide illustrates the arm motion of SSTF.


Shortest Seek Time First


• With SSTF, starvation is possible: A track request for a “remote” track could keep getting shoved to the back of the queue nearer requests are serviced. – Interestingly, this problem is at its worst with low disk

utilization rates.

• To avoid starvation, fairness can be enforced by having the disk arm continually sweep over the surface of the disk, stopping when it reaches a track for which it has a request. – This approach is called an elevator algorithm.


• In the context of disk scheduling, the elevator algorithm is known as the SCAN (which is not an acronym).

• While SCAN entails a lot of arm motion, the motion is constant and predictable.

• Moreover, the arm changes direction only twice: At the center and at the outermost edges of the disk.

The next slide illustrates the arm motion of SCAN.


SCAN Disk Scheduling


• A SCAN variant, called C-SCAN for circular SCAN, treats track zero as if it is adjacent to the highest-numbered track on the disk.

• The arm moves in one direction only, providing a simpler SCAN implementation.

• The following slide illustrates a series of read requests where after track 75 is read, the arm passes to track 99, and then to track 0 from which it starts reading the lowest numbered tracks starting with track 6.


C-SCAN Disk Scheduling


• The disk arm motion of SCAN and C-SCAN is can be reduced through the use of the LOOK and C-LOOK algorithms.

• Instead of sweeping the entire disk, the disk arm travels only to the highest- and lowest-numbered tracks for which access requests are pending.

• Although the circuitry is more complex, LOOK and C-LOOK provide the best theoretical throughput, although the circuitry is the most complex.


• At high utilization rates, SSTF performs slightly better than SCAN or LOOK. But the risk of starvation persists.

• Under very low utilization (under 20%), the performance of any of these algorithms will be acceptable.

• No matter which scheduling algorithm is used, file placement greatly influences performance.

• When possible, the most frequently-used files should reside in the center tracks of the disk, and the disk should be periodically defragmented.


• The best way to reduce disk arm motion is to avoid using the disk as much as possible.

• To this end, many disk drives, or disk drive controllers, are provided with cache memory or a number of main memory pages set aside for the exclusive use of the I/O subsystem.

• Disk cache memory is usually associative. – Because associative cache searches are time-

consuming, performance can actually be better with smaller disk caches because hit rates are usually low.


• Many disk drive-based caches use prefetching techniques to reduce disk accesses.

• When using prefetching, a disk will read a number of sectors subsequent to the one requested with the expectation that one or more of the subsequent sectors will be needed “soon.”

• Empirical studies have shown that over 50% of disk accesses are sequential in nature, and that prefetching increases performance by 40%, on average.


• Prefetching is subject to cache pollution, which occurs when the cache is filled with data that no process needs, leaving less room for useful data.

• Various replacement algorithms, LRU, LFU and random, are employed to help keep the cache clean.

• Additionally, because disk caches serve as a staging area for data to be written to the disk, some disk cache management schemes evict all bytes after they have been written to the disk.


• With cached disk writes, we are faced with the problem that cache is volatile memory.

• In the event of a massive system failure, data in the cache will be lost.

• An application believes that the data has been committed to the disk, when it really is in the cache. If the cache fails, the data just disappears.

• To defend against power loss to the cache, some disk controller-based caches are mirrored and supplied with a battery backup.


• Another approach to combating cache failure is to employ a write-through cache where a copy of the data is retained in the cache in case it is needed again “soon,” but it is simultaneously written to the disk.

• The operating system is signaled that the I/O is complete only after the data has actually been placed on the disk.

• With a write-through cache, performance is somewhat compromised to provide reliability.


• When throughput is more important than reliability, a system may employ the write back cache policy.

• Some disk drives employ opportunistic writes.

• With this approach, dirty blocks wait in the cache until the arrival of a read request for the same cylinder.

• The write operation is then “piggybacked” onto the read operation.


• Opportunistic writes have the effect of reducing performance on reads, but of improving it for writes.

• The tradeoffs involved in optimizing disk performance can present difficult choices.

• Our first responsibility is to assure data reliability and consistency.

• No matter what its price, upgrading a disk subsystem is always cheaper than replacing lost data.

• I/O systems are critical to the overall performance of a computer system.

• Amdahl’s Law quantifies this assertion.

• I/O systems consist of memory blocks, cabling, control circuitry, interfaces, and media.

• I/O control methods include programmed I/O, interrupt-based I/O, DMA, and channel I/O.

• Buses require control lines, a clock, and data lines. Timing diagrams specify operational details.

Chapter 7 Conclusion

• Magnetic disk is the principal form of durable storage.

• Disk performance metrics include seek time, rotational delay, and reliability estimates.

• Optical disks provide long-term storage for large amounts of data, although access is slow.

• Magnetic tape is also an archival medium. Recording methods are track-based, serpentine, and helical scan.


• RAID gives disk systems improved performance and reliability. RAID 3 and RAID 5 are the most common.

• RAID 6 and RAID DP protect against dual disk failure, but RAID DP offers better performance.

• Any one of several new technologies including biological, holographic, or mechanical may someday replace magnetic disks.

• The hardest part of data storage may be end up be in locating the data after it’s stored.


• Most systems are heavily dependent upon I/O subsystems.

• Disk performance can be improved through good scheduling algorithms, appropriate file placement, and caching.

• Caching provides speed, but involves some risk.

• Keeping disks defragmented reduces arm motion and results in faster service time.


Documents

Chapter 7: IO CS140 Computer Organization These slides are derived from those of Null & Lobur + my notes while using previous texts