Chapter 5: Computer Systems Organization · Memory and Cache (con't) RAM (Random Access Memory) Memory made of addressable “cells” Current standard cell size is 1 byte = 8 bits

Chapter 5: Computer Systems

Organization

Invitation to Computer Science,

C++ Version, 6th Edition

Invitation to Computer Science, C++ Version, 6E 2

Objectives

In this chapter, you will learn about:

The components of a computer system

Putting all the pieces together – the Von Neumann architecture

The future: non-Von Neumann architectures


Introduction • Remember that Computer science is the study of algorithms

including • Their formal and mathematical properties (Chapter 1-3)

• their hardware realization (Chapter 4-5)

• Their linguistic realizations.

• Their applications.

Computer organization examines the computer as a collection of interacting “functional units”

Functional units may be built out of the circuits already studied

Higher level of abstraction assists in understanding by reducing complexity

4

Figure 5.1

The Concept of Abstraction

Invitation to Computer Science, C++ Version, 6E


The Components of a Computer System Von Neumann architecture has four functional units:

Memory

The unit that stores and retrieves instructions and data.

Input/Output

Handles communication with the outside world.

Arithmetic/Logic unit

Performs mathematical and logical operations. Control unit

Repeats the following 3 tasks repeatedly 1. Fetches an instruction from memory 2. Decodes the instruction 3. Executes the instruction

Program in memory

Sequential execution of instructions


Memory and Cache

Information stored and fetched from memory subsystem

Random Access Memory (RAM) maps addresses to memory locations

Cache memory keeps values currently in use in faster memory to speed access times

Memory Hierarchy


Fast,

Expensive,

Small

Slow,

Cheap,

Large

RAM

volatile

non-volatile

7


Memory and Cache (con't)

RAM (Random Access Memory)

Memory made of addressable “cells”

Current standard cell size is 1 byte = 8 bits

All memory cells accessed in equal time

Memory address

The address is an unsigned binary number with N bits

Maximum memory size (or Address space) is then 2N cells

bit

Maximum Memory Sizes



Memory and Cache (con't) Memory Size

Memory size is in power of 2

210 =1K 1 kilobyte

220 =1M 1 megabyte

230 =1G 1 gigabyte

240 =1T 1 terabyte

If the MAR is N digits long, the largest address is….?

The maximum memory size for a MAR with

N = 16 is..? _______________

N = 20 is..? _______________

N = 31 is..? ________________ 2 G

1 M

64 K



Memory subsystem

Fetch/store controller

Fetch: retrieve a value from memory

Store: store a value into memory

Memory address register (MAR)

Memory data register (MDR)

Memory cells, with decoder(s) to select individual cells


Figure 5.3 Structure of Random Access Memory

13

Example: The Units Of A Computer


Memory and Decoding Logic


15

Decoder Circuit A decoder is a control circuit which has

N input and 2N output

A 3 to 23 example of decoder

a b c o0 o1 o2 o3 o4 o5 o6 o7

0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1

a b c

0

1

2

3

4

5

6

7




Fetch operation

1. Load the address in to the MAR

The address of the desired memory cell is moved into the MAR

2. Decode the address in the MAR

Fetch/store controller signals a “fetch,” accessing the memory cell.

The memory unit must translate the N-bit address stored in the MAR into the set of signals needed to access that one specific memory cell

A decoder circuit is used for such a purpose

3. Copy the content of the memory location into the MDR

The value at the MAR’s location flows into the MDR

17

Fetch: LOAD X

X

D

D

LOAD X

f

D


18


Store operation

1. Load the address into the MAR

The address of the cell where the value should go is placed in the MAR

2. Load the value into the MDR

The new value is placed in the MDR

3. Decode the address in the MAR and store the content of the MDR into that memory location

Fetch/store controller signals a “store,” copying the MDR’s value into the desired cell


19

Store: STORE X

X D

STORE X

s

D

D


Memory and Decoding Logic


Suffer from Scalability Problem!!

22 Figure 5.7 Overall RAM Organization

ON

ON



Cache Memory

Memory access is much slower than processing time

Faster memory is too expensive to use for all memory cells

Locality principle

Once a instruction (or value) is used, it is likely to be used again

Once a instruction (or value) is used, its neighbors is likely to be used very soon

Small size, fast memory just for values currently in use speeds computing time

=> Cache

Memory Fetch Operation

Three major steps:

1. Look first in cache memory

2. If the desired information is not in the cache, then access it from RAM

3. Copy the data along with the k immediately following memory locations into cache


Assume

•Information we need is in cache 70% of the time and in memory 0.3 of the time

•Cache access costs 5 nsec and memory access costs 20 nsec

Average access time = (0.7 × 5)+0.3 × (5+20) = 11.0

45% reduction!


Input/Output and Mass Storage

Communication with outside world and external data storage

Human interfaces

monitor, keyboard, mouse, printer

Archival storage:

Machine readable, not dependent on constant power, like HD, floppy disks, CD-ROM

External devices vary tremendously from each other


Input/Output and Mass Storage (con't)

Mass storage devices

Direct access storage device

Hard drive, CD-ROM, DVD, etc.

Uses its own addressing scheme to access data

Sequential access storage device

Tape drive, etc.

Stores data sequentially

Used for backup storage these days

Nonvolatile storage

Volatile storage like RAM

27

Magnetic Disks

A read/write head travels across a spinning magnetic disk, retrieving or recording data

Figure 5.8

The organization of a

magnetic disk


28


Direct access storage devices

Data stored on a spinning disk

Disk divided into concentric tracks

Each track is composed of sectors

Read/write head moves from one ring to another while disk spins

Access time depends on:

Time to move head to correct sector

Time for sector to spin to data location

read/write unit


Access time

Best Worst Average

Seek Time 0 19.98 10.00

Latency 0 8.33 4.17

Transfer

time

0.13 0.13 0.13

Total 0.13 28.44 14.30


Seek time – time to position the read/write head over the correct track

Latency- time needed for the correct sector to rotate under the

read/write head

Transfer time- time to read or write the data

Average is half the number of tracks or half a revolution

29



I/O controller

Intermediary between central processor and I/O devices

Processor sends request and data, then goes on with its work

I/O controller interrupts processor when request is complete


Figure 5.9 Organization of an I/O Controller

Invitation to Computer Science, C++ Version, Third Edition 32

Figure 5.18 The Organization of a Von Neumann Computer



The Arithmetic/Logic Unit

The ALU is made up of three parts Registers Interconnections between components ALU circuitry

Actual computations are performed

Primitive operation circuits Arithmetic (ADD, etc.) Comparison (CE, etc.) Logic (AND, etc.)

Data inputs and results stored in registers

Multiplexor selects desired output

ALU circuitry

Interconnections

(bus) Registers

A typical ALU has 16, 32, or 64 registers.

An Arithmetic operation

A + B

operand operator

ALU

Organization


ALU

Organization

(con’t)

ALU



Multiplexor

A multiplexor is a control circuit that selects one of the input lines to allow its value to be output.

To do so, it uses the selector lines to indicate which input line to select.

A multiplexor It has 2N input lines, N selector lines and one output. The N selector lines are set to 0s or 1s. When the values of the N

selector lines are interpreted as a binary number, they represent the number of the input line that must be selected.

With N selector lines you can represent numbers between 0 and 2N-1.

The single output is the value on the line represented by the number formed by the selector lines.


Multiplexor Circuit

multiplexor circuit

2N input lines

0 1 2 2N-1

N selector lines

1 output

Selector: Interpret the selector lines as a binary number. Ex: 00….01 is equal to 1

Ex: Suppose that the selector line write the number 00….01 which is equal to 1, the output is the value on the line numbered 1

. . . . .


Figure 5.12

Using a Multiplexor Circuit to Select the Proper ALU Result

38

RECALL: THE ARITHMETIC/LOGIC UNIT USES

A MULTIPLEXOR

R

AL1

AL2 ALU

circuits

multiplexor

selector lines

output

GT EQ LT

condition code register

Register R

Other registers



The Arithmetic/Logic Unit (con't)

ALU process

1. Values for operations copied into ALU’s input register locations

2. All circuits compute results for those inputs

3. Multiplexor selects the one desired result from all values

4. Result value copied to desired result register


The Control Unit

A control unit comprises

Links to other subsystems

Instruction decoder circuit

Two special registers:

Program Counter (PC)

Stores the memory address of the next instruction to be executed

Instruction Register (IR)

Stores the code for the current instruction

Components of the Control Unit

instruction



The Control Unit (con’t)

Manages the execution of a stored program

Task

While not a HALT instruction or a fatal error

1. Fetch the next instruction to be executed from memory

2. Decode it: determine what is to be done

3. Execute it: issue appropriate command to ALU, memory, and I/O controllers

End of the while loop


Machine Language Instructions

Can be decoded and executed by control unit

An instruction

Operation code (op code)

Unique unsigned-integer code assigned to each machine language operation, such as +, -, *, /, cmp, jump, …

Address field(s)

Memory addresses of the values on which operation will work


Figure 5.14 Typical Machine Language Instruction Format

00001001 0000000001100011 0000000001100100

ADD X Y


Operations of Machine Language

Data transfer

Move values to and from memory and registers

Arithmetic/logic

Perform ALU operations that produce numeric values

Compares

Set bits of compare register to hold result

Branches

Jump to a new memory address to continue processing


Operations Available

Data Transfer load Store Move Clear

Arithmetic Operations add increment subtract decrement

I/0 Operations in out

Compare

compare

Branch

jump

jumpgt

jumpeq

jumplt

jumpneq

halt

There is a different Operation Code (OpCode) for each Operation

Instruction Set for Our Machine



Putting All the Pieces Together—the

Von Neumann Architecture

Subsystems connected by a bus

Bus: wires that permit data transfer among them

At this level, ignore the details of circuits that perform these tasks: Abstraction!

Computer repeats fetch-decode-execute cycle indefinitely


Figure 5.18 The Organization of a Von Neumann Computer

Simple Machine Language Instructions



LOAD 101

101 b

f b

101

LOAD

101

b c

101

102

100



ADD 102

102

c

c ADD 102

f

b

b

c

b+c 102

101

102

100

b+c


54

Store 100

s

b+c

100

101

102

100 100

b+c Store 100

b+c


56

Are All Architectures the von Neumann

Architecture? No.

One of the bottlenecks in the von Neuman Architecture is the fetch-decode-execute cycle.

With only one processor, that cycle is difficult to speed up.

I/O has been done in parallel for many years.

Why have a CPU wait for the transfer of data between the memory and the I/O devices?

Most computers today also multitask – they make it appear that multiple tasks are being performed in parallel (when in reality they aren’t as we’ll see when we look at operating systems).

But, some computers do allow multiple processors.


57

Comparing Various Types of Architecture

Typically, synchronous computers have fairly simple processors so there can be many of them – in the thousands. One has been built by Paracel (GeneMatcher) with over 1M processors. Used by Celera in completing the description of the human genome

sequencing

Pipelined computers are often used for high speed arithmetic calculations as these pipeline easily.

Shared-memory computers basically configure independent computers to work on one task. Typically, there are something like 8, 16, or at most 64 such computers

configured together.

Some recent parallel computers used for gaming such as PlayStation are partially based on this architecture.


58

Synchronous processing

One approach to parallelism is to have multiple processors apply the same program to multiple data sets

Figure 5.6 Processors in a synchronous computing environment


59

Pipelining

Arranges processors in tandem, where each processor contributes one part to an overall computation

Figure 5.7 Processors in a pipeline


60

Shared-Memory

Shared Memory

Processor Processor Processor Processor

Local

Memory1

Local

Memory2

Local

Memory3

Local

Memory4

Different processors do different things to different data.

A shared-memory area is used for communication.


Von Neumann Bottleneck

First generation machine

10000 instructions per sec

2nd generation machine

1 million instructions per sec (MIPS)

Now

Processors

About 1000~5000 MIPS

Tens of billions of transistors separated distances of

less than 0.000001 cm

Speed of light ( approximately 3×108 m/s) => 3 ns for 1 meter

A real-time computer animation: 30 x 3000 x 3000 x 100 = 27 billion instructions per second (27000 MIPS)

Beyond the ability of current processors! Invitation to Computer Science, C++ Version, 6E 61


What are the fastest computer in the world?

Visit the site

http://www.top500.org/list/2005/06/

to find out!

http://www.top500.org/list/2005/06/


Non-Von Neumann Architectures

Physical limitations on speed of Von Neumann computers

Non-Von Neumann architectures explored to bypass these limitations

Parallel computing architectures can provide improvements: multiple operations occur at the same time

Singe Instruction Stream/Multiple Data Stream (SIMD)

Multiple Instruction Stream/Multiple Data Stream (MIMD)

64

SIMD Parallel Processing Architecture

Multiple processors running in parallel

All processors execute same operation at one time

Each processor operates on its own data

Suitable for “vector” operations

Ex: V+1 1980 first supercomputer


Cloud Computing

65

Multiple processors running in parallel

Each processor performs its own operations on its own data

Processors communicate with each other

MIMD Parallel Processing Architecture

High scalability!

Cluster computing

Grid Computing

Cloud Computing


Key of Parallel Computing

To effectively utilize the large number of processors

Parallel Algorithms



Summary of Level 2

Focus on how to design and build computer systems

Chapter 4

Binary codes

Transistors

Gates

Circuits


Summary of Level 2 (con't)

Chapter 5

Von Neumann architecture

Shortcomings of the sequential model of computing

Parallel computers


Summary

Computer organization examines different subsystems of a computer: memory, input/output, arithmetic/logic unit, and control unit

Machine language gives codes for each primitive instruction the computer can perform, and its arguments

Von Neumann machine: sequential execution of stored program

Parallel computers improve speed by doing multiple tasks at one time

Auxiliary Materials


Connecting I/O devices

I/O devices cannot be connected directly to the buses that connect the CPU and memory

I/O devices are electromechanical, magnetic, or optical devices

CPU and memory are electronic devices.

I/O devices also operate at a much slower speed

Input/output devices are therefore attached to the buses through input/output controllers or interfaces.



Figure 5.9 Organization of an I/O Controller


Figure 5.13 Connecting I/O devices to the buses


Small Computer System Interface (SCSI）

First developed for Macintosh computer in 1984

Parallel interface with 8, 16, 32 connections

Daisy-chained connection interface

Each device has a unique address

Both ends of the chain must be connected to a special device (terminator)

p.113

SCSI


//upload.wikimedia.org/wikipedia/commons/d/d4/Centronics_50_SCSI_connector.JPG

p.113

FireWire

IEEE standard 1394 Apple: FireWire

Sony: i.Link

Transfer rate up to 50 MB/sec or double

Connect up to 63 device in a daisy-chain or tree connection interface


USB

Universal Serial Bus (USB) A serial controller that connects both low and high-speed devices Support hot-swappable Use a cable with 4 wires

Two (+5 volts and ground): provide power for low power devices The other two: carry data, address, and control signals

Data transferred in packets Device ID, control part, data part All devices will receive the same packet and filter unnecessary packets by

using Device ID

USB 2 Up to 127 devices connected in a tree-topology with hubs Support transfer rate: 1.5 Mbps, 12 Mbps, and 480 Mbps

Hot swapping describes replacing components without significant interruption to the system

Hot plugging describes the addition of components that would expand

the system without significant interruption to the operation of the system


Types of Main Memory

There are two types of main memory

Random Access Memory (RAM)

holds its data as long as the computer is switched on

All data in RAM is lost when the computer is switched off

Described as being volatile

It is direct access as it can be both written to or read from in any order

Its purpose is to temporarily hold programs and data for processing. In modern computers it also holds the operating system


Types of Main Memory (con’t)

Read Only Memory (ROM)

ROM holds programs and data permanently even when computer is switched off

Data can be read by the CPU in any order so ROM is also direct access

The contents of ROM are fixed at the time of manufacture

Stores a program called the bootstrap loader that helps start up the computer

Access time of between 10 and 50 ns


Types of RAM

1. Dynamic Random Access Memory (DRAM) Contents are constantly refreshed 1000 times per second Access time 60 – 70 nanoseconds

2. Synchronous Dynamic Random Access Memory (SDRAM)

Quicker than DRAM Access time less than 60 nanoseconds

3. Direct Rambus Dynamic Random Access Memory (DRDRAM)

New type of RAM architecture Access time 20 times faster than DRAM More expensive


Types of RAM (con’t)

4. Static Random Access Memory (SRAM)

Doesn’t need refreshing

Retains contents as long as power applied to the chip

Access time around 10 nanoseconds

Used for cache memory

Also for date and time settings as powered by small battery


1. Cache fetches data

from next to current

addresses in main

memory

2. CPU checks to see

whether the next

instruction it requires is in

cache

3. If it is, then the

instruction is fetched from

the cache – a very fast

position

4. If not, the CPU has to

fetch next instruction

from main memory - a

much slower process

Main

Memory

(DRAM)

CPU

Cache

Memory

(SRAM)

= Bus connections

The operation of Cache Memory


Others

Cache memory Small amount of memory typically 256 or 512 kilobytes Temporary store for often used instructions Level 1 cache is built within the CPU (internal) Level 2 cache may be on chip or nearby (external) Faster for CPU to access than main memory

Video Random Access memory Holds data to be displayed on computer screen Has two data paths allowing READ and WRITE to occur at the

same time A system’s amount of VRAM relates to the number of colours

and resolution A graphics card may have its own VRAM chip on boardCache

memory


Others (con’t)

Virtual memory

Uses backing storage e.g. hard disk as a temporary location for programs and data where insufficient RAM available

Swaps programs and data between the hard-disk and RAM as the CPU requires them for processing

A cheap method of running large or many programs on a computer system

Cost is speed: the CPU can access RAM in nanoseconds but hard-disk in milliseconds (Note: a millisecond is a thousandth of a second)

Virtual memory is much slower than RAM


84

Virtual Memory


85

Paging

Allows process to be comprised of a number of fixed-size blocks, called pages

Virtual address is a page number and an offset within the page

Each page may be located any where in main memory

Real address or physical address in main memory


86

Virtual Memory Addressing


CPU MMU

Virtual Addresses

Physical Addresses

Address Translation

Address Space A group of memory addresses usable by something Each program (process) and kernel has potentially different

address spaces.

Address Translation: Translate from Virtual Addresses (emitted by CPU) into Physical

Addresses (of memory) Mapping often performed in Hardware by Memory Management

Unit (MMU)

87 Invitation to Computer Science, C++ Version, 6E

Example of Address Translation

Prog 1 Virtual Address Space 1

Prog 2 Virtual Address Space 2

Code

Data

Heap

Stack

Code

Data

Heap

Stack

Data 2

Stack 1

Heap 1

OS heap & Stacks

Code 1

Stack 2

Data 1

Heap 2

Code 2

OS code

OS data Translation Map 1 Translation Map 2

Physical Address Space 88 Invitation to Computer Science, C++ Version, 6E

Types of ROM

1. Programmable Read Only Memory (PROM)

Empty of data when manufactured

May be permanently programmed by the user

2. Erasable Programmable Read Only Memory (EPROM)

Can be programmed, erased and reprogrammed

The EPROM chip has a small window on top allowing it to be erased by shining ultra-violet light on it

After reprogramming the window is covered to prevent new contents being erased

Access time is around 45 – 90 nanoseconds


Types of ROM (con’t)

3. Electrically Erasable Programmable Read Only Memory (EEPROM)

Reprogrammed electrically without using ultraviolet light

Must be removed from the computer and placed in a special machine to do this

Access times between 45 and 200 nanoseconds

4. Flash ROM Similar to EEPROM

However, can be reprogrammed while still in the computer

Easier to upgrade programs stored in Flash ROM

Used to store programs in devices e.g. modems

Access time is around 45 – 90 nanoseconds


Types of ROM (con’t)

5. ROM cartridges

Commonly used in games machines

Prevents software from being easily copied



Examples of instructions: LOAD X

X

D

D

LOAD X

f

D


STORE X

X D

STORE X

s

D

D


ADD X

X

D

D

ADD X

f

ALU1 & ALU2

E

E+D E

D

E+D

E+D


INCREMENT X

X

D

INC X

D+1


COMPARE X (assume D > E)

X

D

D

COM X

f

ALU1 & ALU2

D

E

E

1 0 0


JUMP X

JUMP X

X


JUMPLT X

JUMPLT X

X

0 0 1


JUMPLT X

1 0 0

JUMPLT X

Similarly for condition code of 0 1 0


IN X

X

D IN X

s

D

D


OUT X

X

D

D

OUT X

f

D

Documents

Chapter 5: Computer Systems Organization · Memory and Cache (con't) RAM (Random Access Memory) Memory made of addressable “cells” Current standard cell size is 1 byte = 8 bits