Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Memory Hierarchy
1. Overview of the memory hierarchy
2. Basics of Cache Memories
3. Virtual Memory
4. Satisfying a memory request
5. Performance and some cache memory optimizations
1 / 30
The Memory Hierarchy
Basic Problem: How to design areasonable cost memory system that candeliver data at speeds close to the CPU’sconsumption rate.
Current Answer: Construct a memoryhierarchy with slow (inexpensive, largesize) components at the higher levelsand with fast (most expensive, smallest)components at the lowest level.
Migration: As information is referenced,migrate the data into and out-of thelowest level memories. I−Cache D−Cache
Cache (L2)
CPU/core
Cache (L3)
Disk
Main Memory
2 / 30
How does this help?
I Programs are well behaved and tend to follow theobserved “principles of locality” for programs.
Principle of temporal locality: states that a referenced dataobject will likely be referenced again in the near future.
Principle of spatial locality: states that if data at location xis referenced at time t then it is likely that a nearby datalocation x + ∆x will be referenced at a nearby time t + ∆t .
I Also consider the 90/10 rule: A program executes about90% of it’s instructions from about 10% of its code space.
3 / 30
Cache Memories
I Similar to paged virtual memory w/ fixed-sized blocksmapped to the cache (from the [physical or virtual] memoryspace)
I Due to speed considerations, all operations implementedin hardware.
I 4 types of mapping policies:
I fully associative
I direct mapped
I set associative
I sector mapped (not discussed further)
4 / 30
Fully Associative Cache
Any block in the (physical or virtual) memory can be mapped into anyblock in cache (just like paged virtual memory). Memory addressesare formed as: n d where n is the memory tag and d is theaddress in the block.
Block 0
Block N−1tag
tag
tag
Cache
Memory
n
Block N−1
Block M
Block 0
Block i
Lookup Algorithm
if ∃α : 0 ≤ α < N ∧ cache[α].tag = nthen return cache[α].word[d]else cache-miss
5 / 30
Direct Mapped Cache
If cache is partitioned into N fixed size blocks then each cache blockk will contain only (physical or virtual) memory blocks k + nN, wheren = 0,1, .... The memory addresses are formed as: n k dwhere n is the memory tag, k is the cache block index, and d is theaddress in the block.
tag
tag
tag
Cache
Block 0
Block 1
Memory
k
n
Block N−1
Block N
Block N+1
Block N−1
Block 1
Block 0
Lookup Algorithm
if cache[k].tag = nthen return cache[k].word[d]else cache-miss
6 / 30
Set Associative Cache
Midpoint between directed mapped and fully associative (at the extremes itdegenerates to each). Basic idea: divide cache into S sets with E = N/S blockframes per set (N total blocks in the cache). Memory address are formed as:
n k d where n is the memory tag, k is the cache set index, and d isthe address in the block.
tag Block 0
tag Block E−1Set 0
tag Block 0
tag Block E−1Set S−1
tag Block 0
tag Block E−1Set 1
k
Block 0
Block 1
Memory
Block S−1
Block S
Block S+1n
CacheLookup Algorithm
if ∃α : 0 ≤ α < E ∧ cache[k].block[α].tag = nthen return cache[k].block[α].word[d]else cache-miss
7 / 30
Cache Policies
I Read/Write Policies
I Read through
I Critical word first
I Write through
I Write back
I Write allocate/no-allocate
I Structure
I Unified or Split Caches
I Contents across Cache Levels
I Multilevel inclusion (Pentium 4)
I Multilevel exclusion (Opteron)
I−Cache D−Cache
Cache (L2)
CPU/core
Cache (L3)
Disk
Main Memory
8 / 30
Lecture break; continue next class
9 / 30
Virtual Memory
Addresses used by programs do not correspond to actualaddresses of the program/data locations in main memory.Instead, there exists a translation mechanism between the CPUand memory that actually associates CPU addresses (virtual)to their actual (physical) in memory
PhysicalAddress
AddressTranslation
VirtualAddress
MainMemory
CPU
The two most important types:I Paged virtual memoryI Segmented virtual memory
10 / 30
Demand Paged Virtual Memory
I Logically subdivide virtual and physical spaces into fixedsize units called pages.
I Keep virtual space on disk (swap for working/dirty pages).
I As referenced (on demand), bring pages into mainmemory (and into the memory hierarchy) and update thepage table.
I Virtual pages can be mapped into any available physicalpage.
I Requires some page replacement algorithm: random,FIFO, LRU, LFU.
11 / 30
Paged Virtual Memory: Structure
Physical Addresses
Main Memory
Virtual Addresses
Program Space
page 0
page 1
page N−1
page M−1
page 0
page 1
12 / 30
Paged Virtual Memory: Address Translation
Virtual Address
Physical Address
valid bit dirty bit/other info
page offsetpage number
page table
page offsetpage number
13 / 30
Segmented Virtual Memory
I Identify segments in virtual space by the logical structureof the program. Using logical structure, we can identifyaccess control to the segment.
I Dynamically map segments into physical space assegments are referenced (on demand). Update segmenttable accordingly
I Keep virtual space on disk (swap for working/dirty pages).
I Need segment placement algorithm: best fit, worst fit, firstfit.
I Needs some segment replacement algorithm.
14 / 30
Segmented Virtual Memory: Structure
segment 0
segment i
segment m
segment m
segment i
Physical Addresses
Main Memory
Virtual Addresses
Program Space
segment 0
15 / 30
Seg. Virt. Memory: Address Translation
Virtual Address
valid bit
Physical Address
segment number seg. offset
dirty bit/length/other info
segment table
+
16 / 30
Segmented-Paged Virtual Memory
I Page the segments and map pages into physical memory.Considerably improving and simplifying physical memoryuse/management.
I Adds an additional translation step.
I Drastically shrinks page table sizes.
I Retains protection capabilities of segments.
I Mechanism used in most contemporary systems (ok,virtualization adds yet another level of indirection....we willdiscuss this later).
17 / 30
Segmented-Paged: Address Translation
segmenttable
pagetable
Virtual Address
Physical Address
seg # page # pg. offset
+
18 / 30
Hardware Implications of Virtual Memory
I Virtual memory requires a careful coordination betweenthe hardware and O/S.
I Virtual memory translation will be performed (at leastmostly) in hardware.
I When hardware cannot translate a virtual address, it willtrap to the O/S.
I Hardware structures for translation will limit the type ofvirtual memory that an O/S can deploy.
19 / 30
Lecture break; continue next class
20 / 30
The Memory Hierarchy: Who does what?
I MMU: Memory Management Unit
I TB: (address) Translation Buffer
I H/W: search to Main Memory
I O/S: handles page faults (invoking DMAoperations)
I if page being replaced is dirty, copyout first (DMA)
I move new page to main memory(DMA)
I O/S: context swaps on page fault
I DMA: operates concurrently w/ othertasks
I−Cache D−Cache
Cache (L2)
Cache (L3)
Disk
Main Memory
TBMMU CPU
21 / 30
Notes
I Memory Management UnitI Manages memory subsystems to main memoryI Translates addresses, searches caches, migrates data
(to/from main memory out/in)
I Translation BufferI Small cache to assist virtual → physical address translation
processI generally small (e.g., 64 entries), size does not need to
correspond to cache sizes
I Simplifying AssumptionsI L1 & L2 use physical addressesI paged virtual memoryI ignoring details of TB misses
22 / 30
Satisfying A Memory Request
Satisfied in L1 cache:1. MMU: translate address2. MMU: search I or D cache as indicated by CPU, success
(sometimes simultaneously with translation)3. MMU: read/write information to/from CPU
23 / 30
Satisfying A Memory Request
Satisfied in L2 cache:1. MMU: translate address2. MMU: search I or D cache as indicated by CPU, failure3. MMU: search L2 cache, success4. MMU: move information between L1 & L2, critical word
first?5. MMU: read/write information to/from CPU
24 / 30
Satisfying A Memory Request
Satisfied in main memory:1. MMU: translate address2. MMU: search I or D cache as indicated by CPU, failure3. MMU: search L2 cache, failure4. MMU: move information between memory & L25. MMU: move information between L1 & L2, critical word
first?6. MMU: read/write information to/from CPU
25 / 30
Satisfying A Memory Request
Not in main memory:1. MMU: translate address, failure trap to O/S2. O/S: page fault, block task for page fault
I O/S: if page dirtyI O/S: initiate DMA transfer to copy page to swapI O/S: block task for DMA interruptI O/S: invoke task schedulerI O/S: on interrupt continue
I O/S: initiate DMA transfer to copy page to main memoryI O/S: block task for DMA interruptI O/S: invoke task schedulerI O/S: on interrupt:
I update page tableI return task to ready to run list
26 / 30
Memory Performance Impact: Naive
CPU exec time
= (CPU clock cycles + Memory stall cycles)× Clock cycle time
Memory stall cycles (doesn’t consider hit times 6= 1)
= Number of Misses ×Miss Penalty
= IC × MissesInstruction
×Miss penalty
= IC × Memory accessesInstruction
×Miss Rate ×Miss penalty
= IC × Memory readsInstruction
× Read miss Rate × Read miss penalty
+IC × Memory WritesInstruction
×Write miss Rate ×Write miss penalty
27 / 30
Average Memory Access Time
Average memory access time
= Hit time + Miss Rate ×Miss penalty= Hit timeL1 + Miss RateL1 × (Hit timeL2 + Miss RateL2 ×Miss penaltyL2)
Accounting for Out-of-order Execution: (discussion probably premature atthis time)
Memory stall CyclesInstruction =
MissesInstruction × (Total miss latency −Overlapped miss latency)
28 / 30
Cache Misses
I Miss Rate
Local miss rate =# misses in a cache
total # of memory accesses to this cache
Global miss rate =# misses in a cache
total # of memory accesses generated by CPU
I Categories of Cache Misses
I Compulsory: cold-start or first-reference misses.
I Capacity: not all blocks needed fit into cache.
I Conflict: some needed blocks map to same cache block.
29 / 30
Basic Cache Memory Optimizations
I Larger block sizes to reduce miss rate
I Larger caches to reduce miss rate
I Higher associativity to reduce miss rate
I Multi-level caches to reduce miss penalty
I Giving priority to read misses over writes to reduce misspenalty
I Avoid address translation to reduce hit time
30 / 30