Computer Organization and Technology Computer Memory Systemmit.wu.ac.th/mit/images/editor/files/COA-Memory-31102013.pdf · Overview-Cache Memory Block diagrams below depict the difference

1

Computer Organization and Technology Computer Memory System

Assoc. Prof. Dr. Wattanapong Kurdthongmee Division of Computer Engineering, School of Engineering

and Resources, Walailak University

2

Introduction Computer Memory: Simple in concept, Widest range of:

type, technology, organization,

performance and cost.

No one technology is optimal is satisfying the requirements.

Some are internal, while some are external. These are arranged in hierarchical form.

3

Overview Less complex if we classify them according to key characteristics:

Location •Processor (register) •Internal (main) •External (secondary)

Capacity •Word size •Number of words

Unit of Transfer •Word •Block

Access Method •Sequential •Direct •Random •Associative

Performance •Access time •Cycle time •Transfer rate

Physical Characteristics •Volatile/nonvolatile •Erasable/nonerasable

Physical Type •Semiconductor •Magnetic •Optical •Magneto-Optical

Organization

4

Overview-Access Method Access Method: Sequential Access:

Memory is organized into records, Access must be made in a specific linear sequence, The time to access an arbitrary record is highly

variable. Example: tape unit.

Direct Access: Individual blocks/records have a unique address based

on physical location, Access is accomplished by direct access to reach a

general vicinity plus sequential searching, counting, or waiting to reach the final location,

Access time is also variable. Example disk unit.

5

Overview-Access Method Access Method: Random Access:

Each addressable location has a unique, physically wired-in addressing mechanism,

Access time is independent of the sequence of prior accesses and is constant,

Any location can be selected at random and directly addressed and accessed,

Example: Main memory. Associative:

Random access type, Enables one to make a comparison of desired bit locations within a

word for a specified match, and to do this for all words simultaneously, A word is retrieved based on a portion of its contents rather than its

address, Each location has its own addressing mechanism with constant access

time

6

Overview-Performance Performance parameters: Access time:

For random access: the time to perform a read/write operation measured from presenting address until data is available,

For non-random access: the time it takes to position read-write mechanism at the desired location.

Memory cycle time: Applied to random access memory, Equal to access time plus an additional time required before a second

access can commence. Transfer rate:

The rate at which data can be transferred into/out of a memory unit, For random access: 1/(cycle time). But for non-random access:

TN = TA + N/R

Average time to read/write N-bit

Average access time Number of bits

Transfer rate (bps)

7

Overview-Memory Hierarchy

Inboard memory

Outboard storage

Off-Line storage

Reg, Cache, Main mem.

Magnetic Disk, CD-ROM, CD-RW, ..

Magnetic Tape, MO, WORM

Cost/bit, Freq. of Access

Capacity, Access time

Design constraints: How much? How fast? How expensive? Faster access time,

greater cost per bit. Greater capacity, smaller cost per bit. Greater capacity, slower access time.

Solution! Not rely on a single memory component/technology, employ a memory hierarchy.

8

Overview-Memory Hierarchy The reduction in frequency of access to memory follows a “locality of reference” principle: During the course of execution of a

program, memory, both instructions and data, references by CPU tend to cluster.

Over a long period of time, the clusters in use change, but over a short period of time, CPU is primarily working with fixed clusters of memory references.

To optimize the computer performance, data across the hierarchy are organized such that the percentage of accesses to each successively lower level is substantially less than that of the level above.

Memory T = 0

T = n

9

Overview-Cache Memory Cache memory principle: Is intended to give memory speed approaching that of the fastest

memories available. Also provide a large memory size at the price of less expensive types

of semiconductor memories. Fetch cycle is reduced by means of the phenomenon of locality of

reference: “when a block of data is fetched into the cache to satisfy a single memory reference, it is likely that future references will be to others in the block.”

CPU Cache Main Memory

Word Transfer Block Transfer

10

Overview-Cache Memory

Processor

Cache Controller

and Cache

Memory

Address buffer

Data buffer

System Bus

Address

Control Control

Data

11

Overview-Cache Memory Initialize cache by copying

main memory to cache

Processor reads a word

Is it in the cache?

Deliver the word to processor

yes

Copy a portion of main memory to cache

no

Cache: Principle of Operation

200 201 202 203

JUMP 204

800 801 802 803 804

12

Overview-Cache Memory Block diagrams below depict the difference structure between

main-memory and cache.

0 1 2

2n-1

word

Memory address

Block (K words)

0 1 2

Line Number

C-1

Tag Block

Block length K words

Tag is used to identify which particular block is currently being stored

Group every K words into a block, C lines of K words

Each word has a unique n-bit address.

13

Overview-Cache Memory Cache memory principle: For the given cache structure, the number of blocks are M = 2n/K, Line size: the number of words in the line. The number of lines is considerably less than the number of main

memory blocks → C << M. At any time, some subset of the blocks of memory resides in lines in

the cache. The “tag” is usually a portion of the main memory address which is

used to identify which particular block is currently being stored.

14

Overview-Cache Memory Elements of Cache Design: Cache size:

Should be small enough so the cost/bit is close to main memory and Should be large enough so the overall average access time is close to

that of the cache alone. Mapping function:

Is the algorithm to map larger main memory to fewer cache lines. Also reflects which main memory block currently occupies a cache line. Three techniques: direct, associative, set associative. Example of mapping function for the following elements:

Cache size: 64KB, Block size: 4 bytes → 16K (214) lines of 4 bytes/line Main memory size: 16MB addressable by address bus of size 24-bit → 4M

blocks of 4 bytes.

15

Overview-Cache Memory Elements of Cache Design: Direct Mapping function:

The simplest technique which maps each block of main memory into only one possible cache line,

The mapping is expressed as:

i = j % m

Cache line number

Main memory block number

Number of lines in the cache

16

Direct Mapping function (cont): For purposes of cache access, each main memory address can be viewed

as consisting of 3-field.

s w Identifies a unique word/byte within a block of main memory (address of data byte)

Specifies one of the 2s blocks of main memory

Cache logic interprets as a tag of s-r bits and a line field of r bits (one of the m = 2r lines of the cache


17


Tag Line Word

Memory Address

Compare

Tag Data

Cache Main Memory

W0

W1

W2

W3

W4j

W(4j+1)

W(4j+2)

W(4j+3)

⊗

(Hit in Cache)

(Miss in Cache)

⊗

L0

Li

Lm-1

B0

Bi

s+w

s-r r w

s-r

w s-r

w

18

Overview-Cache Memory With this mapping blocks of main memory are assigned to lines of the

cache as follows:

The tag is used to distinguish a block of data in each line from other blocks that can fit into that line.

Cache line Main memory blocks assigned

0 0,m,2m,…,2s-m

1 1,m+1,2m+1,…,2s-m+1

m-1 m-1,2m-1,3m-1,…,2s-1

: : :

: : :

19

Overview-Cache Memory For the example system: m = 16K = 214 and i = j modulo 214 the mapping

becomes:

Consider the following example:

Cache line Starting memory address of block

0 000000,010000, …,FF0000

1 000004,010004, …,FF0004

m-1 00FFFC,01FFFC, …,FFFFFC

: : :

: : :

20


0000 0004

FFF8 FFFC

13579246 AABBCCDD

0000 0004

339C

FFFC

77777777 11235813

FEDCBA98 12345678

0000 0004

FFF8 FFFC

FFFEFDFC 01234567

11223344 24682468

00

16

FF

00 16

16

FF 16

13579246 11235813

FEDCBA98

11223344 12345678

0000 0001 0CE7 3FFE 3FFF

Tag Data Line Number

Data Tag

16KWord Cache

16MByte Main Memory

21

Overview-Cache Memory Elements of Cache Design: Direct Mapping function (cont):

During read (fetch) operation, the cache system is presented with a 24-bit address.

The 14-bit line number is used as an index into the cache to access a particular line.

If the 8-bit tag number matches the tag number currently stored in that line, the 2-bit word number is used to select one of the 4-byte in that line.

Otherwise, the 22-bit tag-plus-line field is used to fetch a block from main memory.

This mapping function is simple and inexpensive to implement but if two blocks of main memory which occupy the same line number is repeatedly referred, the blocks will continually be swapped..

22

Overview-Cache Memory Elements of Cache Design: Direct Mapping function (cont): Cache behaviour on reads is fairly consistent across different

implementations. Writes, however, can be handled in one of: No-write: Content modification is not supported. Slow and the

cache line needs to be reloaded! Write-through: Supports the modification of cache contents but do

not support incoherency between cache memory and main memory. Still slow!

Write-back: Enables write to valid cache lines but not immediately causes a write to main memory. This causes incoherency between the cache lines and main memory which is solved by adding a status bit to each cache line to indicate if the line is “dirty” or “clean”.

23

Semiconductor Memory

The basic property that a memory device should possess is that it must have: Two well-defined state that can be used for the storage of binary

information, The ability to switch from one state to another (i.e. reading and

writing a 0 or 1), A fast switching time, A low cost per bit of storage.

Since RAM needs to be fast, the address decoding is done all electronically (without physical movement of the storage media). A nonrandom-access media, either the storage medium or the read/write mechanism is moved to find the data.

24


25


• There are several forms of memory

Memory types

ROM RAM

SRAM

DRAM

SDRAM

EDORAM PROM EPROM

FLASH EEPROM

volatile nonvolatile

Read/Write Memory (RWM)

26


In a RAM, any addressable location in memory can be accessed in a random manner: “the process of reading from/writing into a location in a RAM is the same and takes an equal amount of time (independent of the physical location in the memory)”.

Read/write memory (RWM): Each memory location of the RWM has an address associated with it. Data are input into (written to) and output from (read from) a

memory location by accessing the location using its “address”. Within the RWM, the memory address register (MAR) is responsible

for storing the address being accessed. With n bit in the MAR, 2n locations can be addressed, and they are

numbered from 0 through 2n–1.

27


Read/write memory (RWM): Transfer of data in/out of RWM is usually in terms of a set of bits

known as a memory word (typical word sizes are 8, 16, 32-bit). Each of the 2n words in the memory has m bits. Therefore, this is a (2n×m)–bit memory.

28

RAM Socket


RWM: Read/write memory: Two types of semiconductor

RAMs are now available: static and dynamic.

Each memory cell in a static RAM (SRAM) is built out of a flip-flop. The content of the memory cell (either 1 or 0) remains intact as long as the power is on. SRAM are used in speed critical applications.

A DRAM, is built out of a capacitor. The charge level of the capacitor determined the 1 or 0 state of the cell. As the charge decays with time, these memory cell must be refreshed to retain the memory content.

29

Bit Bit

SRAM bit which requires 6 transistors.

DRAM bit structure which requires only 1 transistor and 1 capacitor.


30

RWM: Read/write memory: DRAMs require complex refresh circuit and because if the refresh

time needed, they are also slower than SRAMs. As more dynamic memory cells can be fabricated on the same area

of silicon than static memory cells can, this makes DRAMs an alternative choice when large memories are needed and speed is not a critical design parameter.


SRAM DRAM

31

Read/write memory (RWM): DRAM can be either asynchronous of synchronous:

Asynchronous: Processor must wait idly for the DRAM to complete its internal operations (∼60 ns).

Synchronous: the DRAM latches information from the processor under the system’s control.

Asynchronous fast-page-mode (FPM) DRAMs run at speeds between 80 and 100 ns.

Extended-data-out (EDO) DRAMs improved speed by about 20%. Both FPM and EDO DRAMS dragged effective speeds down by

forcing CPUs, for 66 MHz mainboard, to wait to receive data from memory.


32

Read/write memory (RWM): While SDRAM uses only one of the wave’s edges to refer data, DDR

(Double Data Rate) SDRAM references both to effectively double the data transmission rate.

Unlike 168-pin SDRAM, DDR SDRAM uses a 184-pin plug.


33

Read only memory (ROM): Is also a random-access memory, except that data can only be read. Data are usually written into a ROM either by the memory

manufacturer or by the user in an off-line mode (use of a special device programmer).

ROM: also main memory and contains data and programs that are not usually altered in real-time during operation.

During operation, the data on output lines of a ROM at the selected address is available as long as the memory is enable.


34


35

Read only memory (ROM): Two types of ROMs are commercially available, mask-programmed

ROMs (MROMS) and user-programmed ROMs. User-programmed ROMs = Programmable ROMS (PROMs) =

EPROM, EEPROM, FLASH Mask-programmed ROMs are used when a large number of ROM

unit containing a particular program and/or data is required. The IC manufacturer can be asked to “burn” the program and data

into the ROM unit. The program is given by the user and the IC manufacturer prepares a mask and uses it to fabricate the program and data into the ROM as the last step in the fabrication.

Therefore, the contents of these ROMs are unalterable. Mask-programmed ROMs are not cost effective unless the

application requires a large number of units.


36

Read only memory (ROM): User-programmed ROMs is fabricated with either all 0s or all 1s

stored in it. A special device called a PROM programmer is used by the user to

burn the required program by sending the proper current through each link.

Content of this type of ROM cannot be altered after initial programming, this makes it sometimes called OTP (One-Time Programmability).

EPROM (Erasable PROMs) are available. An ultraviolet light is used to restore the content of an EPROM to its initial value. It can then be reprogrammed using a PROM programmer.


Documents

Computer Organization and Technology Computer Memory Systemmit.wu.ac.th/mit/images/editor/files/COA-Memory-31102013.pdf · Overview-Cache Memory Block diagrams below depict the difference