View
219
Download
2
Tags:
Embed Size (px)
Citation preview
The Memory Hierarchy II CPSC 321
Andreas Klappenecker
Today’s Menu
CacheVirtual MemoryTranslation Lookaside Buffer
Caches
Why? How?
Memory
• Users want large and fast memories• SRAM is too expensive for main memory• DRAM is too slow for many purposes
• Compromise• Build a memory hierarchy
CPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
Locality
• If an item is referenced, then • it will be again referenced soon (temporal locality)• nearby data will be referenced soon (spatial locality)
• Why does code have locality?
• Mapping: address modulo the number of blocks in the cache, x -> x mod B
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
• Cache with 1024=210 words• tag from cache is compared against
upper portion of the address• If tag=upper 20 bits and valid bit is
set, then we have a cache hit otherwise it is a cache miss
What kind of locality are we taking advantage of?
Direct Mapped Cache
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
The index is determined by address mod 1024
• Taking advantage of spatial locality:
Direct Mapped Cache
Address (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
Bits in a Cache
How many total bits are required for a direct-mapped cache with 16 KB of data, 4 word blocks, assuming a 32 bit address?• 16 KB = 4K words = 2^12 words• Block size of 4 words => 2^10 blocks• Each block has 4 x 32 = 128 bits of data + tag + valid bit• tag + valid bit = (32 – 10 – 2 – 2) + 1 = 19• Total cache size = 2^10*(128 + 19) = 2^10 * 147 Therefore, 147 KB are needed for the cache
Cache Block Mapping
• Direct mapped cache• a block goes in exactly one place in the cache
• Fully associative cache• a block can go anywhere in the cache• it is difficult to find a block• parallel comparison to speed-up search
• Set associative cache• a block can go to a (small) number of places• compromise between the two extremes above
Cache Types
Set Associative Caches
• Each block maps to a unique set,• the block can be placed into any
element of that set,• Position is given by
(Block number) modulo (# of sets in cache)
• If the sets contain n elements, then the cache is called n-way set associative
Summary: Where can a Block be Placed?
Name Number of Sets Blocks per Set
Direct mapped # Blocks in Cache
1
Set associative (#Blocks in Cache)
Associativity
Associativity (typically 2-8)
Fully associative
1 Number of blocks in
cache
Summary: How is a Block Found?
Associativity Number of Sets # Comparisons
Direct mapped Index 1
Set associative
Index the set, search among elements
Degree of Associativity
Fully associative
search all cache entries
size of the cache
separate lookup table
0
Virtual Memory
Virtual Memory
• Processor generates virtual addresses• Memory is accessed using physical
addresses • Virtual and physical memory is broken into
blocks of memory, called pages• A virtual page may be
• absent from main memory, residing on the disk• or may be mapped to a physical page
Virtual Memory
• Main memory can act as a cache for the secondary storage (disk)
• Virtual address generated by processor (left)• Address translation (middle)• Physical addresses (right)
• Advantages:• illusion of having more physical memory• program relocation • protection
Physical addresses
Disk addresses
Virtual addresses
Address translation
Pages: virtual memory blocks
• Page faults: if data is not in memory, retrieve it from disk• huge miss penalty, thus pages should be fairly large
(e.g., 4KB)• reducing page faults is important (LRU is worth the
price)• can handle the faults in software instead of hardware• using write-through takes too long so we use writeback• Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB
3 2 1 011 10 9 815 14 13 1231 30 29 28 27
Page offsetVirtual page number
Virtual address
3 2 1 011 10 9 815 14 13 1229 28 27
Page offsetPhysical page number
Physical address
Translation
Page Faults
• Incredible high penalty for a page fault• Reduce number of page faults by
optimizing page placement• Use fully associative placement
• full search of pages is impractical• pages are located by a full table that indexes
the memory, called the page table• the page table resides within the memory
Page Tables
Physical memory
Disk storage
Valid
1
1
1
1
0
1
1
0
1
1
0
1
Page table
Virtual pagenumber
Physical page ordisk address
The page table maps each page to either a page in mainmemory or to a page stored on disk
Page Tables
Page offsetVirtual page number
Virtual address
Page offsetPhysical page number
Physical address
Physical page numberValid
If 0 then page is notpresent in memory
Page table register
Page table
20 12
18
31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
Making Memory Access Fast
• Page tables slow us down• Memory access will take at least twice as
long• access page table in memory• access page
• What can we do?
Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer
Making Address Translation Fast
A cache for address translations: translation lookaside buffer
Valid
1
1
1
1
0
1
1
0
1
1
0
1
Page table
Physical pageaddressValid
TLB
1
1
1
1
0
1
TagVirtual page
number
Physical pageor disk address
Physical memory
Disk storage
Translation Lookaside Buffer
• Some typical values for a TLB• TLB size 32-4096• Block size: 1-2 page table entries (4-8bytes
each)• Hit time: 0.5-1 clock cycle• Miss penalty: 10-30 clock cycles• Miss rate: 0.01%-1%
TLBs and Caches
Yes
Deliver datato the CPU
Write?
Try to read datafrom cache
Write data into cache,update the tag, and put
the data and the addressinto the write buffer
Cache hit?Cache miss stall
TLB hit?
TLB access
Virtual address
TLB missexception
No
YesNo
YesNo
Write accessbit on?
YesNo
Write protectionexception
Physical address
More Modern Systems
• Very complicated memory systems:Characteristic Intel Pentium Pro PowerPC 604
Virtual address 32 bits 52 bitsPhysical address 32 bits 32 bitsPage size 4 KB, 4 MB 4 KB, selectable, and 256 MBTLB organization A TLB for instructions and a TLB for data A TLB for instructions and a TLB for data
Both four-way set associative Both two-way set associativePseudo-LRU replacement LRU replacementInstruction TLB: 32 entries Instruction TLB: 128 entriesData TLB: 64 entries Data TLB: 128 entriesTLB misses handled in hardware TLB misses handled in hardware
Characteristic Intel Pentium Pro PowerPC 604Cache organization Split instruction and data caches Split intruction and data cachesCache size 8 KB each for instructions/data 16 KB each for instructions/dataCache associativity Four-way set associative Four-way set associativeReplacement Approximated LRU replacement LRU replacementBlock size 32 bytes 32 bytesWrite policy Write-back Write-back or write-through
• Processor speeds continue to increase very fast— much faster than either DRAM or disk access times
• Design challenge: dealing with this growing disparity
• Trends:• synchronous SRAMs (provide a burst of data)• redesign DRAM chips to provide higher bandwidth or processing • restructure code to increase locality• use prefetching (make cache visible to ISA)
Some Issues