31
Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O Computer Architecture Hebrew University Spring 2001 Chapter 8–Input/Output Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O Big Picture: Where are We Now? I/O Systems Processor Computer Control Datapath Memory Devices Input Output

Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

  • Upload
    volien

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O

Computer Architecture

Hebrew UniversitySpring 2001

Chapter 8–Input/Output

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 2

Big Picture: Where are We Now?

I/O Systems

Processor

Computer

Control

Datapath

Memory Devices

Input

Output

Page 2: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 3

Outline of Chapter Lectures

I/O Systems and PerformanceTypes and Characteristics of I/O Devices– Magnetic Disks– Graphic Displays

– Networks

Buses– Bus Types and Bus Operation

– Bus Arbitration and How to Design a Bus Arbiter» some examples

Interfacing the OS and the I/O Device– Operating System’s Role

– Delegating I/O Responsibility from the CPU

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 4

I/O System Design Issues

Performance– throughput and latency– fast devices and many devices

Expandability/Flexibility (number + variety of devices)– wide performance spectrum

Resilience in the face of failure

Device Behavior Partner Data Rate (KB/sec)Keyboard Input Human 0.01Mouse Input Human 0.02Line Printer Output Human 1.00Laser Printer Output Human 100.00Graphics Output Human 30,000.00Network-LAN Input/Output Machine 200.00Floppy disk Storage Machine 50.00Optical Disk Storage Machine 500.00Magnetic Disk Storage Machine 2,000.00

Page 3: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 5

Typical Desktop I/O System

200 MHz Pentium ProcessorPipeline

Caches2.4GB/sec

Chipset Memory

528MB/sec

PCI132MB/sec

DiskController

Ethernet Controller

USB HubController

1.5Mb/sec

Mouse

Keyboard

Printer

GraphicsController

Disk Disk Graphics

USB

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 6

I/O System Performance

I/O System performance depends on many aspects of the system:– The CPU– The memory system:

» Internal and external caches» Main Memory

– The underlying interconnection (buses)– The I/O controller

– The I/O device– The speed of the I/O software– The efficiency of the software’s use of the I/O devices

Two common performance metrics:– Throughput: I/O bandwidth– Response time: Latency

Page 4: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 7

Producer-Server Model

Throughput:– The number of tasks completed by the server in unit time– Highest possible throughput:

» Server should never be idle» Queue should never be empty

Response time:– Begins when a task is placed in the queue– Ends when it is completed by the server

– To minimize response time:» Queue should be empty» Server will be idle whenever task arrives

Producer ServerQueue

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 8

Throughput versus Response Time

20% 40% 60% 80% 100%

ResponseTime (ms)

100

200

300

Percentage of maximum throughput

Page 5: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 9

Throughput Enhancement

In general throughput can be improved by:– Throwing more hardware at the problem

Response time is much harder to reduce:– Ultimately it is limited by the speed of light

Producer

ServerQueue

QueueServer

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 10

I/O Benchmarks for Magnetic Disks

Supercomputer application:– Large-scale scientific problems

Transaction processing:– Examples: Airline reservations systems and banks

File system:– Example: UNIX file system

Page 6: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 11

Supercomputer I/O

Supercomputer I/O is dominated by:– Access to large files on magnetic disks

Supercomputer I/O consists of – one large read (read in the data)

– Many writes to snapshot the state of the computation– Supercomputer I/O consists of more output than input

Key supercomputer I/O measures = data throughput:– Bytes/second transferred between disk and memory

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 12

Transaction Processing I/O

Transaction processing (TP):– Examples: airline reservations systems, bank ATMs,

inventory system, purchasing system, ...– Many small changes to a large body of shared data

Transaction processing requirements:– Throughput and response time are both important

» response time: for users» throughput: for cost

– Must gracefully handle certain types of failure

Transaction processing chiefly focuses on I/O rate:– The number of disk accesses per second

Each transaction in typical TP system takes:– Between 2 and 10 disk I/Os

– Between 5,000 and 20,000 CPU instructions per disk I/O

Page 7: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Adapted from COD2e by Petterson & Hennessy Chapter 8 I/O 13

File System I/O

Measurements of UNIX file systems in an engineering environment:– 80% of accesses are to files less than 10 KB– 90% of all file accesses are to data with sequential

addresses on the disk

– 67% of the accesses are reads– 27% of the accesses are writes

– 6% of the accesses are read-write accesses

Chapter 8 I/O 14

Magnetic Disk

Purpose:– Long term, nonvolatile storage– Large, inexpensive, and slow– Lowest level in memory hierarchy

Two major types:– Floppy disk

– Hard disk

Both types of disks:– Rely on a rotating platter coated with a magnetic surface

– Use a moveable read/write head to access the disk

Advantages of hard disks over floppy disks:– Platters are rigid ( metal or glass) so they can be larger

– Higher density because it can be controlled more precisely– Higher data rate because it spins faster– Can incorporate more than one platter

Registers

Cache

Mem

ory

Disk

Page 8: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 15

Organization of Hard Disk

Typical numbers (depending disk size):– 500 to 2,000 tracks per surface– 32 to 128 sectors per track

» A sector is the smallest unit that can be read or written

Traditionally all tracks have same number of sectors:– Constant bit density: record more sectors on outer tracks

» Being done in new disks

Platters

Track

Sector

Chapter 8 I/O 16

Magnetic Disk Characteristic

Cylinder: all the tacks under the head at a given point on all surfacesRead/write data is a three-stage process:– Seek time: position the arm over the proper track– Rotational latency: wait for the desired sector

to rotate under the read/write head– Transfer time: transfer a block of bits (sector)

under the read-write headAverage seek time as reported by the industry:– Typically in the range of 12 ms to 20 ms– (Sum of the time for all possible seek) / (total # of

possible seeks)

Due to locality of disk reference, actual average seek time:– Typically only 25%–33% of advertised number

SectorTrack

Cylinder

Head

Platter

Page 9: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 17

Typical Performance of a Disk

Rotational Latency:– Most disks rotate at 3,600–5,400 RPM– Approximately 12–16 ms per revolution– Average latency to the desired

sector is 1/2 around the disk: 6–8 msTransfer Time is a function of :– Transfer size (usually a sector): 1 KB / sector– Rotation speed: 3600 RPM to 5400 RPM– Recording density: typical diameter ranges

from 2 to 14 in– Typical values: 2 to 4 MB per second

SectorTrack

Cylinder

Head

Platter

Chapter 8 I/O 18

Reliability and Availability

I/O systems often have fault tolerant requirements:– must be able to get the data even if there is a failure

» Data is available» Reliable = never any failures

Reliability and Availability often confusedAvailability can be improved by adding hardware:– Example: adding ECC on memory

Reliability can only be improved by:– Bettering environmental conditions

– Building more reliable components– Building with fewer components

» Improve availability may come at the cost of lower reliability

Page 10: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 19

Disk Arrays

Organize disks to take advantage of – Small, inexpensive disks– Arrange them in an array– Increase throughput using many disk drives:

» Data is spread over multiple disk» Multiple accesses are made to several disks

Reliability is lower than a single disk:– But availability can be improved by adding redundant disks:

Lost information can be reconstructed from redundant information

– MTTR: mean time to repair is in the order of hours– MTTF: mean time to failure of disks is three to five years

Chapter 8 I/O 20

N scan lines

M pixels

Graphics Displays

Raster Cathode Ray Tube (CRT) Displays:– Resolution: (M pixels) x (N horizontal scan lines)

Typical Sizes:– Studio Quality: 720 x 480

– High Resolution Workstation Monitor: 1280 x 1024– High Definition TV (proposed standards):

» U.S.A.: 1440 x 960» Others: 1920 x 1080

Page 11: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 21

Graphics Displays and Bit Map

M x N image represented by a bit map in memory:– Black & White: 1bit per pixel

» The pixel is either ON or OFF. No longer widely used.

– Gray scale: 8 bits per pixel» Each pixel can have 256 shades of black white

– Color: 24 bits per pixel» 8 bits for each of the primary color: Red, Green, Blue

N scan lines

M pixels

Chapter 8 I/O 22

Cost of Computer Graphics

Size of Frame Buffer = M x N x P– M: Number of pixels per scan line– N: Number of horizontal scan lines– P: Number of bits per pixel

Color Frame Buffer for High Resolution Workstation:– 1280 x 1024 x 24-bit = 3840 KB ~= 4 MB

The size of the color frame buffer can be reduced:– Most pictures do not need full palette (224) of possible colors– Use a two-level representation

Page 12: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 23

Color Map: Lower Cost of Frame Buffer

Color Map Size is 256-word x 24-bit:– Only 256 different colors can appear on the screen at a time

The color map is loaded by the application program:– Each picture can have its own palette of color to chose from

FrameBuffer

Memory

Color CRTDisplayN

8 bits

M

N

M

256-word x 24-bitColor Map

Word 0

Word 1

W 255

:

:

:

Word P

Word Q

P

Q

y1

y2

x2x1

y1

y2

x2x1

Chapter 8 I/O 24

Bandwidth for Frame Buffer

Assume no color map:– Frame Buffer Size: 1280 x 1024 x 24 = 3840 KB ~= 4 MB– Bandwidth Requirement: 4 MB x 60 Hz = 240 MB / sec

Assume N = 32-bit = 4-byte:– Frame buffer needs to respond:

» 240 / 4 = 60 MHz => 16.6 ns

– This is faster than most of the low cost SRAM

– Another reason to use a color map!

1280 x 1024 x 24-bitHigh Resolution. Color Monitor

Refresh Rate: 60 Hz

Frame Buffer1280 x 1024 x 24

Bandwidth?

N

DA

C

Page 13: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 25

Bandwidth for Frame Buffer (Cont.)

With a 256-word by 24-bit color map:– Frame Buffer Size: 1280 x 1024 x 8 = 1280 KB– Bandwidth Requirement: 1280 KB x 60 Hz = 75 MB / sec

Assume N = 32-bit = 4-byte:– The frame buffer needs to response at: 75/4 =18.75 MHz =>

53.3 ns– DRAM is still too slow and SRAM is too expensive.

– Solution: Video RAM (VRAM)

1280 x 1024 x 24-bitHigh Resolution. Color Monitor

Refresh Rate: 60 Hz

Frame Buffer1280 x 1024 x 8

Bandwidth?

N

256-wordx 24 -bit

Color Map

DA

C

Chapter 8 I/O 26

Video Random Access Memory (VRAM)

A high-speed shift register– Hold one row of DRAM

Random Port:– Access the DRAM array

– Works like regular DRAM(and just as slow)

Serial Port:– Access the shift register

N row

s

N columns

DRAM

ColumnAddress

M-bitRandom Port

M bits

N x M Shift Reg.

Row Address

M-bitSerial Port

Page 14: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 27

ABCs of Networks

Starting Point: Send bits between 2 computersFIFO Queue on each endCan send both ways (“Full Duplex”)Rules for communication “protocol”– May involve variety of control information

» request/response» size of message» type of message

Chapter 8 I/O 28

A Simple Example

What is format of packet?– packet = unit of maximum size that makes up all messages

» routing of packet independent of other packets

– Fixed? Number bytes?» fixed or variable size

Request/Response

Address/Data

1 bit 32 bits0: Please send data from Address1: Data corresponding to request

Page 15: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 29

Example: Ethernet (IEEE 802.3)

Essentially 10Kb/s 1 wire bus with no central control– Collision based protocol with carrier sense

» Listen. If nobody is taking, go ahead and talk.

» If you hear anybody else talking,stop and try again later.Binary exponential backoff

– increase wait time by 2x each collision

Recently: – 100Mbit (and 1Gbit coming)

– switched organization

Transceiver (detects collision)

Cable (50 ohm coax, 10Mbps, 500M)

Computer (or repeater)Transceiver Cable

(50M)

Frame Frame Frame

ContentionInterval

Contention Slot2t = worst-case round trip time

= 512 bit times = 51.2 ms

idle

Frame

ManchesterEncoding

Chapter 8 I/O 30

Buses: Connecting I/O to the System

Bus–a shared communication link– uses one set of wires to connect multiple subsystems– as opposed to a point-to-point link

– connects processor to I/O devices as well as connecting I/O devices to memory

» processor may get data/commands from CPU or memory

Control

Datapath

Memory

Processor

Input

Output

Page 16: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 31

Advantages of Buses

Versatility:– New devices can be added easily– Peripherals can be moved between computer

systems that use the same bus standard

Low Cost:– A single set of wires is shared in multiple ways

MemoryProcessor

I/O Device

I/O Device

I/O Device

Chapter 8 I/O 32

Disadvantages of Buses

Busses create a communication bottleneck– Bandwidth of bus can limit the maximum I/O throughput

The maximum bus speed is largely limited by:– The length of the bus

– The number of devices on the bus– The need to support a range of devices with:

» Widely varying latencies » Widely varying data transfer rates

MemoryProcessor

I/O Device

I/O Device

I/O Device

Page 17: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 33

General Organization of a Bus

Control lines:– Signal requests and acknowledgments– Indicate what type of information is on the data lines

Data lines: carry information between source and destination:– Data and Addresses

– Complex commands

A bus transaction includes two parts:– Sending the address

– Receiving or sending the data

Data LinesControl Lines

Chapter 8 I/O 34

Master versus Slave

A bus transaction includes two parts:– Sending the address– Receiving or sending the data

Master is the one who starts the bus transaction by:– Sending the address

Slave is the one who responds to the address by:– Sending data to the master if the master ask for data

– Receiving data from master if master wants to send data

BusMaster

BusSlave

Master send address

Data can go either way

Page 18: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 35

Output Operation

Output = Processor sending data to the I/O device

Processor

Control (Memory Read Request)

Memory

Step 1: Request Memory

I/O Device (Disk)

Data(Memory Address)

Processor

Control

Memory

Step 2: Read Memory

I/O Device (Disk)

Data

Processor

Control (Device Write Request)

Memory

Step 3: Send Data to I/O Device

I/O Device (Disk)

Data (I/O Device Address

and then Data)

Chapter 8 I/O 36

Input Operation

Processor

Control (Memory Write Request)

Memory

Step 1: Request Memory

I/O Device (Disk)

Data (Memory Address)

Processor

Control (I/O Read Request)

Memory

Step 2: Receive Data

I/O Device (Disk)

Data(I/O Device Address

and then Data)

Input = processor receiving data from the I/O device

Page 19: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 37

Types of Buses

Processor-Memory Bus (design specific)– Short and high speed– Only need to match the memory system

» Maximize memory-to-processor bandwidth (cache transfers)

– Connects directly to the processor

I/O Bus (industry standard)– Usually is lengthy and slower

– Need to match a wide range of I/O devices (cost and speed)– Connects to the processor-memory bus or backplane bus

Backplane Bus (often industry standard)– Backplane: an interconnection structure within the chassis– Allow processors, memory, and I/O devices to coexist– Cost advantage: one single bus for all components

Chapter 8 I/O 38

One Bus System: Backplane Bus

A single bus (the backplane bus) is used for:– Processor to memory communication– Communication between I/O devices and memory

Advantages: Simple and low costDisadvantages: slow and the bus can become a major bottleneckExample: IBM PC: ISA

Processor Memory

I/O Devices

Backplane Bus

Page 20: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 39

A Two-Bus System

I/O buses tap processor-memory bus via adaptors:– Processor-memory bus: mainly processor-memory traffic– I/O buses: provide expansion slots for I/O devices

Apple Macintosh-II– NuBus: Processor, memory, and a few selected I/O devices– SCSI Bus: the rest of the I/O devices

Processor Memory

I/OBus

Processor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/OBus

I/OBus

Chapter 8 I/O 40

A Three-Bus System

Backplane buses tap into the processor-memory bus– Processor-memory bus used for processor memory traffic– I/O buses are connected to the backplane bus

Advantage: load on processor bus greatly reducedMany new PCs use this organization

Processor MemoryProcessor Memory Bus

BusAdaptor

BusAdaptor

BusAdaptor

I/O BusBackplane Bus

I/O Bus

Page 21: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 41

Synchronous & Asynchronous Bus

Synchronous Bus:– Includes a clock in the control lines– Fixed protocol for communication, relative to the clock– Advantage: involves very little logic and can run very fast

– Disadvantages:» Every device on the bus must run at the same clock rate» To avoid clock skew, bus cannot be long if they are fast

Asynchronous Bus:– It is not clocked– It can accommodate a wide range of devices– It can be lengthened without worrying about clock skew

– It requires a handshaking protocol

Chapter 8 I/O 42

A Handshaking Protocol

Three control lines– ReadReq: indicates a read request for memory

» Address is put on he data lines at same line

– DataRdy: indicates data word is ready on data lines» Data is put on data lines at same time

– Ack: acknowledge ReadReq or DataRdy of other party

ReadReq

AddressData Data

Ack

DataRdy

1 2

2

3

4

4

56

6 7

Page 22: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 43

Increasing Bus Bandwidth

Separate versus multiplexed address and data lines:– Address and data can be transmitted in one bus cycle

if separate address and data lines are available

– Cost: (a) more bus lines, (b) increased complexityData bus width:– Increasing the width of the data bus, transfers of multiple

words require fewer bus cycles

– Cost: more bus linesBlock transfers:

– Transfer multiple words in back-to-back bus cycles– Only one address needs to be sent at the beginning– The bus is not released until the last word is transferred

– Cost: (a) increased complexity (b) decreased response time for request

Split transaction: free up the bus when not transferring

– Cost: complexity, potentially latency

Chapter 8 I/O 44

Obtaining Access to the Bus

One of the most important issues in bus design:– How is the bus reserved by a device that wishes to use it?

Chaos is avoided by a master-slave arrangement:– Only the bus master can control access to the bus:

» It initiates and controls all bus requests

– A slave responds to read and write requests

The simplest system:– Processor is the only bus master

– All bus requests must be controlled by the processor– Major drawback: processor is involved in every transaction

BusMaster

BusSlave

Control: Master initiates requests

Data can go either way

Page 23: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 45

Multiple Masters => Arbitration

Bus arbitration scheme:– Bus master wanting to use bus asserts “bus request”– Bus master cannot use the bus until its request is granted– Bus master must signal to the arbiter after finish using the

bus

Bus arbitration schemes try to balance two factors:– Bus priority: highest priority device should be serviced first

– Fairness: lowest priority devices should never be completely locked out

Bus arbitration schemes–four broad classes:– Distributed arbitration by self-selection: each device

wanting bus places a code indicating its identity on bus.

– Distributed arbitration by collision detection: Ethernet– Daisy chain arbitration: most common in I/O– Centralized, parallel arbitration: centralize it.

Chapter 8 I/O 46

Daisy Chain Arbitration Scheme

Advantage: simpleDisadvantages:– Cannot assure fairness:

» A low-priority device may be locked out indefinitely

– The use of “daisy chain” grant signal limits bus speed

BusArbiter

Device 1(HighestPriority)

Device NLowestPriority

Device 2

Grant Grant GrantRelease

Request

Page 24: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 47

Bus Example: PCI

Clock at 33 MHz (with extension to 66 MHz) [CLK]Central arbitration [REQ#, GNT#]– Overlapped with previous transaction

Multiplexed Address/Data– 32 lines (with extension to 64) [AD]

General Protocol– Transaction type (bus command) [C/BE#]– Address handshake and duration[FRAME#, TRDY#]

– Data width (byte enable) [C/BE#]– Variable-length data block handshake [IRDY#, TRDY#]

Maximum Bandwidth is 132 MB/s

Chapter 8 I/O 48

Example: USB (Universal Serial Bus)

Targeted at low-cost, low-bandwidth peripherals– Software control and hardware logic are complex

» But these are inexpensive for high volume

– Advantages for configurability, expandability, and ease of use» cables are inexpensive and easy to connect

Clock at 1.5 MHz (with option for 12 MHz)Serial bus with a single signal– Differential pair for noise tolerance– Clock encoded with data using Non-Return to Zero, Inverted

» Each signal transition corresponds to a 0 bit» After 6 consecutive 1’s, a 0 is stuffed to force a transition

– Star topology:» connections routed point-to-point and through hubs

Page 25: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 49

More on USB

General Protocol– Transactions are packed into 1 ms frames– Each transaction consists of three phases

» Token specifies device and transaction type» Data transfer up to 1KB» Handshake to ensure reliable transfer

– Software schedules transactions» hardware arbitration is unnecessary

Maximum bandwidth 1.5 Mb/s (with option 12 Mb/s)

Chapter 8 I/O 50

USB System Configuration

Figure from Universal Serial BusSystem Architecture

Page 26: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 51

USB Signaling

Figure from Universal Serial BusSystem Architecture

Data encoded on differential-pair signalline allows clock to be recovered.

Chapter 8 I/O 52

Operating System Tasks

Operating system acts as the interface between:– The I/O hardware and the program that requests I/O

Three characteristics of the I/O systems:– I/O system is shared by multiple program using processor

– I/O systems often use interrupts (externally generated exceptions) to communicate information about I/O operations.

» Interrupts must be handled by the OS because they cause a transfer to supervisor mode

– The low-level control of an I/O device is complex:» Managing a set of concurrent events» The requirements for correct device control are very detailed

Page 27: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 53

Operating System Requirements

Provide protection to shared I/O resources– Guarantees that a user’s program can only access the

portions of an I/O device to which the user has rights

Provides abstraction for accessing devices:– Supply routines that handle low-level device operation

Handles the interrupts generated by I/O devicesProvide equitable access to the shared I/O resources– All user programs must have equal access to the I/O

resources

Schedule accesses in order to enhance system throughput

Chapter 8 I/O 54

OS–I/O Communication

The Operating System must be able to prevent:– User program from communicating with I/O device directly

If user programs could perform I/O directly:– Protection to shared I/O resources could not be provided

Three types of communication are required:1. OS must be able to give commands to I/O devices2. I/O device must be able to notify OS when I/O device has

completed an operation or has encountered an error

3. Data must be transferred between memory and an I/O device

Page 28: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 55

Giving Commands to I/O Devices

Two methods are used to address the device:– Special I/O instructions– Memory-mapped I/O

Special I/O instructions specify:– Both the device number and the command word

» Device number: the processor communicates this via aset of wires normally included as part of the I/O bus

» Command word: usually sent on the bus’s data lines» Used in Intel x86 and IBM 360

Memory-mapped I/O:– Portions of the address space are assigned to I/O device– Read and writes to those addresses are interpreted

as commands to the I/O devices– User programs prevented issuing I/O operations directly:

» I/O address space is protected by the address translation» Used in MIPS, SPARC, etc.

Chapter 8 I/O 56

I/O Device Notifying the OS

The OS needs to know when:– The I/O device has completed an operation– The I/O operation has encountered an error

This can be accomplished in two different ways:– Polling:

» The I/O device put information in a status register» The OS periodically check the status register–overhead

– I/O Interrupt:» Whenever an I/O device needs attention from the processor,

it interrupts the processor from what it is currently doing.» Interrupt must convey information to OS–what device?

done with either a register or interrupt vector

» I/O interrupt is asynchronousprocessor can wait till end of current instruction before recognizing interrupt

– Interrupts may have different priorities–hardware support

Page 29: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 57

Programmed I/O and Polling

Advantage: – Simple: processor is totally in control and does all the work

Disadvantage:– Polling overhead and data transfer consume CPU time

CPU

IOC

device

Memory

Is thedeviceready?

store data to device

yes no

done? noyes

busy wait loopnot an efficient

way to use the CPUunless the device

is very fast!

processor may be inefficient way to

transfer data

read data from

memory

Example: output operation

Chapter 8 I/O 58

Interrupt Driven Data Transfer

Advantage:– User program progress only

halted during actual transfer

Disadvantage, special hardware needed to:– Cause an interrupt (I/O device)– Detect interrupt (processor)– Save proper states to resume

after interrupt (processor)

addsubandornop

readstore...rti

memory

userprogram(1) I/O

interrupt

(2) save PC

(3) interruptservice addr

interruptserviceroutine(4)

CPU

IOC

device

Memory

:

Page 30: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 59

Programmer’s View

Add

SubDiv

mainprogram

Service the(keyboard)interrupt

Save processorstatus/state

Restore processorstatus/state

(3) get PC

interrupts request (e.g., from keyboard)(1)

(2) Save PC and “branch” to interrupt target address

Chapter 8 I/O 60

Delegating I/O Handling: DMA

Direct Memory Access (DMA):– External to the CPU– Act as a maser on the bus

– Transfer blocks of data to or from memory without CPU intervention

CPU

IOC

device

Memory DMAC

CPU sends a starting address, direction, and length count to DMAC. Then issues "start".

DMAC provides handshakesignals for PeripheralController, and MemoryAddresses and handshakesignals for Memory.

Page 31: Computer Architecture - Hebrew University of Jerusalem€“ P: Number of bits per pixel Color Frame Buffer for High Resolution Workstation: – 1280 x 1024 x 24-bit = 3840 KB ~= 4

Chapter 8 I/O 61

Delegating I/O Handling: IOP

CPU IOP

Mem

D1

D2

Dn

. . .main memory

bus

I/Obus

CPU

IOP

(1) Issuesinstructionto IOP

memory

(2)

(3)

DMA vs. IOP?how is I/O

program stored and accessed?

OP Device Address

target devicewhere commands

IOP looks in memory for commands

OP Addr Cnt Other

whatto do

whereto putdata

howmuch

specialrequests

(4) IOP interrupts CPU when done

Device to/from memory transfers are controlled by the

IOP directly.

IOP steals memory cycles.

Chapter 8 I/O 62

Summary

I/O Performance–many factors– expandability, latency, bandwidth

I/O Devices: wide spectrum– disks, graphics, networks

Buses– different types of buses:

» Processor-memory buses, I/O buses, Backplane buses

– Bus arbitration schemes:» Daisy chain arbitration: cheap, but cannot assure fairness» Centralized parallel arbitration: requires a central arbiter

OS and I/O – communication: polling and interrupts– Handling I/O outside CPU: DMA, I/O processor