26
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212Chapter 7

Memory Hierarchy

Instructor: Jason D. Bakos

Page 2: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 2

Memory Hierarchy

• Programmers want more memory and faster memory

• Problems:– Denser memories require longer access times

• Example: papers on your desk vs. papers in your filing cabinet

– Fast memories are extremely expensive per unit capacity

• Examples:– SRAM: .5 – 5 ns access time, $1K/GB– DRAM: 50 – 70 ns access time, $100/GB– Magnetic disk: 5 – 20 ms access time, $.10/GB

Page 3: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 3

Locality

• Goal:– Achieve the access time of smaller memories but have the

effective capacity of larger memories

• Solution:

– Temporal locality• memory locations are accessed more than once

– Spatial locality• when a memory location is accessed, there’s a good chance a nearly

location will be accessed in the near future

Page 4: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 4

Memory Hierarchy

Page 5: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 5

Memory Hierarchy• Each level of the hierarchy stores a subset

of the level below it

• Each level can only communicate with the level below it

• For now, assume 2-level hierarchy– CPU-cache-RAM– cache is usually on-chip

• Sometimes the data we need is not in cache– hit rate

• Block or line– spatial locality

• miss penalty– time required to move a line to the top of the

hierarchy (may vary)

CPU cache mainmemory

Page 6: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 6

Caches

• Questions:

1. How do we know if the requested location is in the cache?

2. How do we find it?

Page 7: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 7

Cache Organization

n words

tags

address(31 downto (log2 n + 2))• Fully associative

– Too many tags to compare!

Page 8: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 8

Direct Mapped Cache

Page 9: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 9

Direct Mapped Cache

• Direct mapped – each memory location maps to only one location in the cache

8 wordstags

addr(31:8)addr(7:5)

000

001

010

011

100

101

110

111

Page 10: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 10

Addresses

• The memory address can be partitioned:

• Example: 128 lines, 16 word lines:

tag bits index

log2lines bits

(which line in each set?)

word offset

log2lines_size bits

(which word in the line?)

byte offset

2 bits

(which byte in the word?)

tag bits index word offset byte offset

1:05:29:331:10

Page 11: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 11

Cache Organization

Page 12: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 12

The Three C’s

• Three different kinds of misses:

– Compulsary (cold-start) misses• First access to a block

– Capacity misses• Replaced block is needed again• Because… cache capacity isn’t sufficient for the program

– Conflict (collision) misses• Multiple blocks compete for the same set

Page 13: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 13

Associativity

• 2-way set associative:– Two choices where to store a given line

• Replacement policy (ex. LRU)

8 wordstags 0

addr(31:8)addr(7:5)

000

001

010

011

100

101

110

111

8 wordstags 1

addr(31:8)

Page 14: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 14

Associative Cache Organization

Page 15: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 15

Cache Behavior

• Hits at the top-level cache can usually be performed in one (or a few) clock cycles

• Misses stall the processor

• Writes can be handled using

– Write-through (write allocate, write no-allocate)• When cache data is changed, the lower level memory is updated

immediately• Use a write buffer

– Write-back• When cache data is changed, the lower level memory isn’t updated until the

cache line containing the changes is replaced

Page 16: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 16

Memory Systems

• Main memory is DRAM, designed for density (not access time)

• How to reduce miss penalty?

Page 17: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 17

Average Memory Access Time

• AMAT = hit_time + miss_rate * miss_penalty

• Reduce miss rate:– Larger cache (capacity misses)– Increase associativity (conflict misses)– Replacement policy

– Each of these may increase hit time and miss penalty

• Reduce miss penalty:– Wider or banked memory bus

Page 18: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 18

Virtual Memory

• Main memory acts as a cache to secondary storage– Allows memory to be shared– Make memory appear to be larger than it physically is

• Each program has own address space• Enforces protection

• Virtual memory block is called a page, a miss is called a page fault

• Virtual addresses are translated into physical addresses– Address mapping / address translation– Combination of hardware and software

Page 19: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 19

Virtual Memory

Page 20: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 20

Virtual Memory

Page 21: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 21

Page Faults

• Main memory is 100,000 times faster than disk– Page faults are expensive

• Reduce page fault rate– Fully associative placement of pages in memory

• Each process has a page table that maps virtual addresses to physical addresses

• OS creates space on disk for all the process’s pages– Swap space

• OS maintains another table that keeps track of each page in main memory– During a page fault, the OS must decide which page to replace– Least recently used (LRU)– Write-back used for writes

Page 22: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 22

Page Table

Page 23: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 23

Page Table

Page 24: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 24

TLB

• Page lookups must be performed in hardware– Page table is cached on-chip– Translation-lookaside buffer– Small fully associative or large limited associative

Page 25: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 25

Integrating Cache and VM

• Data cannot be in the cache unless it is present in main memory

• Cache can be– physically addressed (TLB in critical path)– virtually addressed (TLB out of critical path)

• Cache miss requires TLB access

• TLB miss means:– page is in memory but we need the TLB entry, or– page is not in memory (page fault)– (both handled by OS software)

Page 26: CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos

CSCE 212 26

TLB Misses and Page Faults

• When a virtual address causes a page fault…1. Look up page table entry and find location on disk2. Choose a physical page to replace, write-back if dirty3. Read page from disk into chosen physical page (allow another process to run)

• TLB miss in MIPS– BadVAddr set, special exception triggered (8000 0000), go to TLB miss handler– Context register:

• bits 31:20 base of the page table• bits 19:2 virtual address of the missing page

– Use Context register directly to load missing entry• If the page table entry is invalid, a page fault exception occurs at the normal handler (8000 0180)

– Move missing entry to EntryLo register– Execute tlbwr to move EntryLo to TLB at address stored in Random register (free

running counter)– Execute eret to return

• TLB miss exception doesn’t save process state (fast) while page fault does (slow)