View
221
Download
2
Tags:
Embed Size (px)
Citation preview
Caching I
Andreas KlappeneckerCPSC321 Computer
Architecture
Memory
Current memory is largely implemented inCMOS technology. Two alternatives: SRAM
fast, but not area efficient stored value in a pair of inverting gates
DRAM slower, but more area efficient value stored on charge of a capacitor (must
be refreshed)
Static RAM
Static RAM
Dynamic RAM
Dynamic RAM
Memory Users want large and fast memories
SRAM is too expensive for main memory DRAM is too slow for many purposes
Compromise Build a memory hierarchy
CPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
Locality
If an item is referenced, then it will be again referenced soon (temporal locality) nearby data will be referenced soon (spatial locality)
Why does code have locality?
Memory Hierarchy The memory is organized as a
hierarchy levels closer to the processor is a
subset of any level further away the memory can consist of several
multiple levels, but data is typically copied between two adjacent levels at a time
initially, we focus on two levels
Memory Hierarchy
Two Level Hierarchy Upper level (smaller and faster) Lower level (slower) A unit of information that is present or not
within a level is called a block If data requested by the processor is in the
upper level, then this is called a hit, otherwise it is called a miss
If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred
Cache
A cache represents some level of memory between CPU and main memory
[More general definitions are often used]
A Toy Example Assumptions
Suppose that processor requests are each one word, and that each block consists of one word
Example Before request C = [X1,X2,…,Xn-1] Processor requests Xn not contained in C item Xn is brought from the memory to the cache After the request C = [X1,X2,…,Xn-1,Xn]
Issues What happens if the cache is full?
Issues How do we know whether the data
item is in the cache? If it is, how do we find it?
Simple strategy: direct mapped cache exactly one location where data
might be in the cache
Mapping: address modulo the number of blocks in the cache, x -> x mod B
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
Cache with 1024=210 words tag from cache is compared against upper
portion of the address If tag=upper 20 bits and valid bit is set, then we
have a cache hit otherwise it is a cache miss
What kind of locality are we taking advantage of?
Direct Mapped Cache
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
Direct Mapped Cache Example
Direct Mapped Cache Example
Direct Mapped Cache Example
Taking advantage of spatial locality:
Direct Mapped Cache
Address (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
Read hits this is what we want!
Read misses stall the CPU, fetch block from memory, deliver to cache,
restart Write hits:
can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later)
Write misses: read the entire block into the cache, then write the word
Hits vs. Misses
Hits vs. Miss Example