Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Computer Organization and Technology Computer Memory System
Assoc. Prof. Dr. Wattanapong Kurdthongmee Division of Computer Engineering, School of Engineering
and Resources, Walailak University
2
Introduction Computer Memory: Simple in concept, Widest range of:
type, technology, organization,
performance and cost.
No one technology is optimal is satisfying the requirements.
Some are internal, while some are external. These are arranged in hierarchical form.
3
Overview Less complex if we classify them according to key characteristics:
Location •Processor (register) •Internal (main) •External (secondary)
Capacity •Word size •Number of words
Unit of Transfer •Word •Block
Access Method •Sequential •Direct •Random •Associative
Performance •Access time •Cycle time •Transfer rate
Physical Characteristics •Volatile/nonvolatile •Erasable/nonerasable
Physical Type •Semiconductor •Magnetic •Optical •Magneto-Optical
Organization
4
Overview-Access Method Access Method: Sequential Access:
Memory is organized into records, Access must be made in a specific linear sequence, The time to access an arbitrary record is highly
variable. Example: tape unit.
Direct Access: Individual blocks/records have a unique address based
on physical location, Access is accomplished by direct access to reach a
general vicinity plus sequential searching, counting, or waiting to reach the final location,
Access time is also variable. Example disk unit.
5
Overview-Access Method Access Method: Random Access:
Each addressable location has a unique, physically wired-in addressing mechanism,
Access time is independent of the sequence of prior accesses and is constant,
Any location can be selected at random and directly addressed and accessed,
Example: Main memory. Associative:
Random access type, Enables one to make a comparison of desired bit locations within a
word for a specified match, and to do this for all words simultaneously, A word is retrieved based on a portion of its contents rather than its
address, Each location has its own addressing mechanism with constant access
time
6
Overview-Performance Performance parameters: Access time:
For random access: the time to perform a read/write operation measured from presenting address until data is available,
For non-random access: the time it takes to position read-write mechanism at the desired location.
Memory cycle time: Applied to random access memory, Equal to access time plus an additional time required before a second
access can commence. Transfer rate:
The rate at which data can be transferred into/out of a memory unit, For random access: 1/(cycle time). But for non-random access:
TN = TA + N/R
Average time to read/write N-bit
Average access time Number of bits
Transfer rate (bps)
7
Overview-Memory Hierarchy
Inboard memory
Outboard storage
Off-Line storage
Reg, Cache, Main mem.
Magnetic Disk, CD-ROM, CD-RW, ..
Magnetic Tape, MO, WORM
Cost/bit, Freq. of Access
Capacity, Access time
Design constraints: How much? How fast? How expensive? Faster access time,
greater cost per bit. Greater capacity, smaller cost per bit. Greater capacity, slower access time.
Solution! Not rely on a single memory component/technology, employ a memory hierarchy.
8
Overview-Memory Hierarchy The reduction in frequency of access to memory follows a “locality of reference” principle: During the course of execution of a
program, memory, both instructions and data, references by CPU tend to cluster.
Over a long period of time, the clusters in use change, but over a short period of time, CPU is primarily working with fixed clusters of memory references.
To optimize the computer performance, data across the hierarchy are organized such that the percentage of accesses to each successively lower level is substantially less than that of the level above.
Memory T = 0
T = n
9
Overview-Cache Memory Cache memory principle: Is intended to give memory speed approaching that of the fastest
memories available. Also provide a large memory size at the price of less expensive types
of semiconductor memories. Fetch cycle is reduced by means of the phenomenon of locality of
reference: “when a block of data is fetched into the cache to satisfy a single memory reference, it is likely that future references will be to others in the block.”
CPU Cache Main Memory
Word Transfer Block Transfer
10
Overview-Cache Memory
Processor
Cache Controller
and Cache
Memory
Address buffer
Data buffer
System Bus
Address
Control Control
Data
11
Overview-Cache Memory Initialize cache by copying
main memory to cache
Processor reads a word
Is it in the cache?
Deliver the word to processor
yes
Copy a portion of main memory to cache
no
Cache: Principle of Operation
200 201 202 203
JUMP 204
800 801 802 803 804
12
Overview-Cache Memory Block diagrams below depict the difference structure between
main-memory and cache.
0 1 2
2n-1
word
Memory address
Block (K words)
0 1 2
Line Number
C-1
Tag Block
Block length K words
Tag is used to identify which particular block is currently being stored
Group every K words into a block, C lines of K words
Each word has a unique n-bit address.
13
Overview-Cache Memory Cache memory principle: For the given cache structure, the number of blocks are M = 2n/K, Line size: the number of words in the line. The number of lines is considerably less than the number of main
memory blocks → C << M. At any time, some subset of the blocks of memory resides in lines in
the cache. The “tag” is usually a portion of the main memory address which is
used to identify which particular block is currently being stored.
14
Overview-Cache Memory Elements of Cache Design: Cache size:
Should be small enough so the cost/bit is close to main memory and Should be large enough so the overall average access time is close to
that of the cache alone. Mapping function:
Is the algorithm to map larger main memory to fewer cache lines. Also reflects which main memory block currently occupies a cache line. Three techniques: direct, associative, set associative. Example of mapping function for the following elements:
Cache size: 64KB, Block size: 4 bytes → 16K (214) lines of 4 bytes/line Main memory size: 16MB addressable by address bus of size 24-bit → 4M
blocks of 4 bytes.
15
Overview-Cache Memory Elements of Cache Design: Direct Mapping function:
The simplest technique which maps each block of main memory into only one possible cache line,
The mapping is expressed as:
i = j % m
Cache line number
Main memory block number
Number of lines in the cache
16
Direct Mapping function (cont): For purposes of cache access, each main memory address can be viewed
as consisting of 3-field.
s w Identifies a unique word/byte within a block of main memory (address of data byte)
Specifies one of the 2s blocks of main memory
Cache logic interprets as a tag of s-r bits and a line field of r bits (one of the m = 2r lines of the cache
Overview-Cache Memory
17
Overview-Cache Memory
Tag Line Word
Memory Address
Compare
Tag Data
Cache Main Memory
W0
W1
W2
W3
W4j
W(4j+1)
W(4j+2)
W(4j+3)
⊗
(Hit in Cache)
(Miss in Cache)
⊗
L0
Li
Lm-1
B0
Bi
s+w
s-r r w
s-r
w s-r
w
18
Overview-Cache Memory With this mapping blocks of main memory are assigned to lines of the
cache as follows:
The tag is used to distinguish a block of data in each line from other blocks that can fit into that line.
Cache line Main memory blocks assigned
0 0,m,2m,…,2s-m
1 1,m+1,2m+1,…,2s-m+1
m-1 m-1,2m-1,3m-1,…,2s-1
: : :
: : :
19
Overview-Cache Memory For the example system: m = 16K = 214 and i = j modulo 214 the mapping
becomes:
Consider the following example:
Cache line Starting memory address of block
0 000000,010000, …,FF0000
1 000004,010004, …,FF0004
m-1 00FFFC,01FFFC, …,FFFFFC
: : :
: : :
20
Overview-Cache Memory
0000 0004
FFF8 FFFC
13579246 AABBCCDD
0000 0004
339C
FFFC
77777777 11235813
FEDCBA98 12345678
0000 0004
FFF8 FFFC
FFFEFDFC 01234567
11223344 24682468
00
16
FF
00 16
16
FF 16
13579246 11235813
FEDCBA98
11223344 12345678
0000 0001 0CE7 3FFE 3FFF
Tag Data Line Number
Data Tag
16KWord Cache
16MByte Main Memory
21
Overview-Cache Memory Elements of Cache Design: Direct Mapping function (cont):
During read (fetch) operation, the cache system is presented with a 24-bit address.
The 14-bit line number is used as an index into the cache to access a particular line.
If the 8-bit tag number matches the tag number currently stored in that line, the 2-bit word number is used to select one of the 4-byte in that line.
Otherwise, the 22-bit tag-plus-line field is used to fetch a block from main memory.
This mapping function is simple and inexpensive to implement but if two blocks of main memory which occupy the same line number is repeatedly referred, the blocks will continually be swapped..
22
Overview-Cache Memory Elements of Cache Design: Direct Mapping function (cont): Cache behaviour on reads is fairly consistent across different
implementations. Writes, however, can be handled in one of: No-write: Content modification is not supported. Slow and the
cache line needs to be reloaded! Write-through: Supports the modification of cache contents but do
not support incoherency between cache memory and main memory. Still slow!
Write-back: Enables write to valid cache lines but not immediately causes a write to main memory. This causes incoherency between the cache lines and main memory which is solved by adding a status bit to each cache line to indicate if the line is “dirty” or “clean”.
23
Semiconductor Memory
The basic property that a memory device should possess is that it must have: Two well-defined state that can be used for the storage of binary
information, The ability to switch from one state to another (i.e. reading and
writing a 0 or 1), A fast switching time, A low cost per bit of storage.
Since RAM needs to be fast, the address decoding is done all electronically (without physical movement of the storage media). A nonrandom-access media, either the storage medium or the read/write mechanism is moved to find the data.
24
Semiconductor Memory
25
Semiconductor Memory
• There are several forms of memory
Memory types
ROM RAM
SRAM
DRAM
SDRAM
EDORAM PROM EPROM
FLASH EEPROM
volatile nonvolatile
Read/Write Memory (RWM)
26
Semiconductor Memory
In a RAM, any addressable location in memory can be accessed in a random manner: “the process of reading from/writing into a location in a RAM is the same and takes an equal amount of time (independent of the physical location in the memory)”.
Read/write memory (RWM): Each memory location of the RWM has an address associated with it. Data are input into (written to) and output from (read from) a
memory location by accessing the location using its “address”. Within the RWM, the memory address register (MAR) is responsible
for storing the address being accessed. With n bit in the MAR, 2n locations can be addressed, and they are
numbered from 0 through 2n–1.
27
Semiconductor Memory
Read/write memory (RWM): Transfer of data in/out of RWM is usually in terms of a set of bits
known as a memory word (typical word sizes are 8, 16, 32-bit). Each of the 2n words in the memory has m bits. Therefore, this is a (2n×m)–bit memory.
28
RAM Socket
Semiconductor Memory
RWM: Read/write memory: Two types of semiconductor
RAMs are now available: static and dynamic.
Each memory cell in a static RAM (SRAM) is built out of a flip-flop. The content of the memory cell (either 1 or 0) remains intact as long as the power is on. SRAM are used in speed critical applications.
A DRAM, is built out of a capacitor. The charge level of the capacitor determined the 1 or 0 state of the cell. As the charge decays with time, these memory cell must be refreshed to retain the memory content.
29
Bit Bit
SRAM bit which requires 6 transistors.
DRAM bit structure which requires only 1 transistor and 1 capacitor.
Semiconductor Memory
30
RWM: Read/write memory: DRAMs require complex refresh circuit and because if the refresh
time needed, they are also slower than SRAMs. As more dynamic memory cells can be fabricated on the same area
of silicon than static memory cells can, this makes DRAMs an alternative choice when large memories are needed and speed is not a critical design parameter.
Semiconductor Memory
SRAM DRAM
31
Read/write memory (RWM): DRAM can be either asynchronous of synchronous:
Asynchronous: Processor must wait idly for the DRAM to complete its internal operations (∼60 ns).
Synchronous: the DRAM latches information from the processor under the system’s control.
Asynchronous fast-page-mode (FPM) DRAMs run at speeds between 80 and 100 ns.
Extended-data-out (EDO) DRAMs improved speed by about 20%. Both FPM and EDO DRAMS dragged effective speeds down by
forcing CPUs, for 66 MHz mainboard, to wait to receive data from memory.
Semiconductor Memory
32
Read/write memory (RWM): While SDRAM uses only one of the wave’s edges to refer data, DDR
(Double Data Rate) SDRAM references both to effectively double the data transmission rate.
Unlike 168-pin SDRAM, DDR SDRAM uses a 184-pin plug.
Semiconductor Memory
33
Read only memory (ROM): Is also a random-access memory, except that data can only be read. Data are usually written into a ROM either by the memory
manufacturer or by the user in an off-line mode (use of a special device programmer).
ROM: also main memory and contains data and programs that are not usually altered in real-time during operation.
During operation, the data on output lines of a ROM at the selected address is available as long as the memory is enable.
Semiconductor Memory
34
Semiconductor Memory
35
Read only memory (ROM): Two types of ROMs are commercially available, mask-programmed
ROMs (MROMS) and user-programmed ROMs. User-programmed ROMs = Programmable ROMS (PROMs) =
EPROM, EEPROM, FLASH Mask-programmed ROMs are used when a large number of ROM
unit containing a particular program and/or data is required. The IC manufacturer can be asked to “burn” the program and data
into the ROM unit. The program is given by the user and the IC manufacturer prepares a mask and uses it to fabricate the program and data into the ROM as the last step in the fabrication.
Therefore, the contents of these ROMs are unalterable. Mask-programmed ROMs are not cost effective unless the
application requires a large number of units.
Semiconductor Memory
36
Read only memory (ROM): User-programmed ROMs is fabricated with either all 0s or all 1s
stored in it. A special device called a PROM programmer is used by the user to
burn the required program by sending the proper current through each link.
Content of this type of ROM cannot be altered after initial programming, this makes it sometimes called OTP (One-Time Programmability).
EPROM (Erasable PROMs) are available. An ultraviolet light is used to restore the content of an EPROM to its initial value. It can then be reprogrammed using a PROM programmer.
Semiconductor Memory