60
Caching Registers Cache Main Memory Disk Cache is fast (1–5 cycle access time) memory sitting between fast registers and slow RAM (10–100 cycles access time) cse/UNSW COMP9242 2002/S2 W3 P1

Caching Main Memory - UNSW Engineering

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Caching Main Memory - UNSW Engineering

Caching

Registers CacheMain

Memory Disk

• Cache is fast (1–5 cycle access time) memory sitting betweenfast registers and slow RAM (10–100 cycles access time)

cse/UNSW COMP9242 2002/S2 W3 P1

Page 2: Caching Main Memory - UNSW Engineering

Caching

Registers CacheMain

Memory Disk

• Cache is fast (1–5 cycle access time) memory sitting betweenfast registers and slow RAM (10–100 cycles access time)

• Holds recently used data or instructions to save memory accesses.

• Matches slow RAM access time to CPU speed if high hit rate(≥ 90 %)

• Is hardware maintained and (mostly) transparent to software

• Sizes range from few kB to several MB.

• Usually a hierarchy of caches (2–5 levels), on- and off-chip.

Good overview in [Sch94].

cse/UNSW COMP9242 2002/S2 W3 P1

Page 3: Caching Main Memory - UNSW Engineering

Cache organisation

• Data transfer unit between registers and cache ≤ 1 word (1–8B)

• Date transfer unit between cache and RAM is cache line,typically 16–32 bytes, sometimes 128 bytes and more.

• Line is unit of storage allocation in cache.

cse/UNSW COMP9242 2002/S2 W3 P2

Page 4: Caching Main Memory - UNSW Engineering

Cache organisation

• Data transfer unit between registers and cache ≤ 1 word (1–8B)

• Date transfer unit between cache and RAM is cache line,typically 16–32 bytes, sometimes 128 bytes and more.

• Line is unit of storage allocation in cache.

• Each line has associated control info:➜ valid bit,➜ modified bit,➜ tag.

cse/UNSW COMP9242 2002/S2 W3 P2

Page 5: Caching Main Memory - UNSW Engineering

Cache organisation

• Data transfer unit between registers and cache ≤ 1 word (1–8B)

• Date transfer unit between cache and RAM is cache line,typically 16–32 bytes, sometimes 128 bytes and more.

• Line is unit of storage allocation in cache.

• Each line has associated control info:➜ valid bit,➜ modified bit,➜ tag.

• Cache improves memory access by:➜ absorbing most reads (increases bandwidth, reduces latency),➜ making writes asynchronous (hides latency),➜ clustering reads and writes (hides latency).

cse/UNSW COMP9242 2002/S2 W3 P2

Page 6: Caching Main Memory - UNSW Engineering

Cache access

Virtually indexed: looked up by virtual address,operates concurrently with address translation.

Physically indexed: looked up by physical address

MMU

PhysicalCache

MainMemory

PhysicalAddress

PhysicalAddress

VirtualCache

CPU

VirtualAddress

DataDataData

cse/UNSW COMP9242 2002/S2 W3 P3

Page 7: Caching Main Memory - UNSW Engineering

CACHE ACCESS MECHANICS

• Address is hashed to produce index of line set.

• Associative lookup of line within setn lines per set: n-way set associative cache.➜ Typically n = 1 . . . 5.➜ n = 1 is called direct mapped.➜ n =∞ is called fully associative, unusual for CPU caches.

• Hashing must be simple (complex hardware is slow)⇒ use least-significant bits of address.

cse/UNSW COMP9242 2002/S2 W3 P4

Page 8: Caching Main Memory - UNSW Engineering

CACHE INDEXING

set

line

t 0

t 1

t 2

tag data

Address

bt s

set#

byte#

tag

➜ The tag is used to distinguish lines of set...

➜ ... consists of the address bits not used for indexing.

cse/UNSW COMP9242 2002/S2 W3 P5

Page 9: Caching Main Memory - UNSW Engineering

Cache mapping

• Different memory locations map to same cache line:

RAM

Cache

. . .n-10 1 ...n-10 n-10 1 ...1 ... ...

n-10 1 ...

10n-10 1 ... n-1

cse/UNSW COMP9242 2002/S2 W3 P6

Page 10: Caching Main Memory - UNSW Engineering

Cache mapping

• Different memory locations map to same cache line:

RAM

Cache

. . .n-10 1 ...n-10 n-10 1 ...1 ... ...

n-10 1 ...

10n-10 1 ... n-1

? All memory locations mapping to cache line # i

are said to be of colour i.? n-way associative cache can hold n lines of the same colour.⇒ different types of cache misses:➜ capacity misses: cache is full.➜ conflict misses: set corresponding to address is full, even

though there may be unused lines in the cache.

cse/UNSW COMP9242 2002/S2 W3 P6

Page 11: Caching Main Memory - UNSW Engineering

Cache replacement policy

• Indexing (via virtual or physical address) points to line set.

• If all lines of set are valid must replace an existing one.

• Replacement strategy must be simple as it’s all done in hardware.

• Dirty bit is used to determine whether a line must be flushed backto memory before invalidation.

• Typical policies:➜ LRU (or approximation)➜ FIFO➜ random

cse/UNSW COMP9242 2002/S2 W3 P7

Page 12: Caching Main Memory - UNSW Engineering

Cache write (update) policy

• Treatment of store operations:? write back: Stores update cache only,

memory updated when dirty line is replaced (flushed).➜ Memory is inconsistent with cache.

? write through: Stores update cache and memory immediately.➜ Memory is always consistent with cache.

cse/UNSW COMP9242 2002/S2 W3 P8

Page 13: Caching Main Memory - UNSW Engineering

Cache write (update) policy

• Treatment of store operations:? write back: Stores update cache only,

memory updated when dirty line is replaced (flushed).➜ Memory is inconsistent with cache.

? write through: Stores update cache and memory immediately.➜ Memory is always consistent with cache.

• On store to a line not presently in cache, use:? write allocate: allocate a cache line to the data and store,? no allocate: store to memory and bypass cache.

cse/UNSW COMP9242 2002/S2 W3 P8

Page 14: Caching Main Memory - UNSW Engineering

Cache write (update) policy

• Treatment of store operations:? write back: Stores update cache only,

memory updated when dirty line is replaced (flushed).➜ Memory is inconsistent with cache.

? write through: Stores update cache and memory immediately.➜ Memory is always consistent with cache.

• On store to a line not presently in cache, use:? write allocate: allocate a cache line to the data and store,? no allocate: store to memory and bypass cache.

• Usual combinations: write-back & write-allocate,write-through & no-allocate.

cse/UNSW COMP9242 2002/S2 W3 P8

Page 15: Caching Main Memory - UNSW Engineering

Cache types

• Virtually indexed, virtually tagged (also called virtual cache):➜ only uses virtual addresses,➜ can operate concurrently with MMU.

cse/UNSW COMP9242 2002/S2 W3 P9

Page 16: Caching Main Memory - UNSW Engineering

Cache types

• Virtually indexed, virtually tagged (also called virtual cache):➜ only uses virtual addresses,➜ can operate concurrently with MMU.

• Virtually indexed, physically tagged:➜ virtual address for accessing line, physical address for tagging,➜ needs address translation completed for retrieving data,➜ index concurrently with MMU, use MMU output for tag check.

cse/UNSW COMP9242 2002/S2 W3 P9

Page 17: Caching Main Memory - UNSW Engineering

Cache types

• Virtually indexed, virtually tagged (also called virtual cache):➜ only uses virtual addresses,➜ can operate concurrently with MMU.

• Virtually indexed, physically tagged:➜ virtual address for accessing line, physical address for tagging,➜ needs address translation completed for retrieving data,➜ index concurrently with MMU, use MMU output for tag check.

• Physically indexed:➜ only uses physical addresses➜ needs address translation completed before begin of access

cse/UNSW COMP9242 2002/S2 W3 P9

Page 18: Caching Main Memory - UNSW Engineering

Cache types

• Virtually indexed, virtually tagged (also called virtual cache):➜ only uses virtual addresses,➜ can operate concurrently with MMU.

• Virtually indexed, physically tagged:➜ virtual address for accessing line, physical address for tagging,➜ needs address translation completed for retrieving data,➜ index concurrently with MMU, use MMU output for tag check.

• Physically indexed:➜ only uses physical addresses➜ needs address translation completed before begin of access

• Typically, virtually indexed caches are on-chip, physically indexedcaches are off-chip.

cse/UNSW COMP9242 2002/S2 W3 P9

Page 19: Caching Main Memory - UNSW Engineering

Virtual Cache Issues

HOMONYMS:

• Same VA corresponds to several PAs➜ standard situation in multitasking systems (not SASOS!)

cse/UNSW COMP9242 2002/S2 W3 P10

Page 20: Caching Main Memory - UNSW Engineering

Virtual Cache Issues

HOMONYMS:

• Same VA corresponds to several PAs➜ standard situation in multitasking systems (not SASOS!)

• Problem: tag may not uniquely identify cache data!• Homonyms lead to cache accessing the wrong data!

cse/UNSW COMP9242 2002/S2 W3 P10

Page 21: Caching Main Memory - UNSW Engineering

Virtual Cache Issues

HOMONYMS:

• Same VA corresponds to several PAs➜ standard situation in multitasking systems (not SASOS!)

• Problem: tag may not uniquely identify cache data!• Homonyms lead to cache accessing the wrong data!• Homonym prevention:

➜ tag virtual address with address-space ID (ASID)➜ disambiguates virtual addresses (makes them globally valid)

➜ use physical tags➜ flush cache on context switch➜ use a SASOS

cse/UNSW COMP9242 2002/S2 W3 P10

Page 22: Caching Main Memory - UNSW Engineering

SYNONYMS (ALIASES ):

• Several VAs map to the same PA➜ frames shared between processes➜ multiple mappings of a frame within an address space

cse/UNSW COMP9242 2002/S2 W3 P11

Page 23: Caching Main Memory - UNSW Engineering

SYNONYMS (ALIASES ):

• Several VAs map to the same PA➜ frames shared between processes➜ multiple mappings of a frame within an address space

• Synonyms in cache may lead to accessing stale data:➜ same data may be cached in several lines➜ on write, one synonym is update➜ a subsequent read on the other synonym returns the old value

cse/UNSW COMP9242 2002/S2 W3 P11

Page 24: Caching Main Memory - UNSW Engineering

SYNONYMS (ALIASES ):

• Several VAs map to the same PA➜ frames shared between processes➜ multiple mappings of a frame within an address space

• Synonyms in cache may lead to accessing stale data:➜ same data may be cached in several lines➜ on write, one synonym is update➜ a subsequent read on the other synonym returns the old value

• Physical tagging doesn’t help!• ASIDs don’t help either!• Whether synonyms are a problem depends on cache

size and page size!

cse/UNSW COMP9242 2002/S2 W3 P11

Page 25: Caching Main Memory - UNSW Engineering

EXAMPLE : MIPS R4X00 SYNONYMS

• Virtually-indexed, physically tagged on-chip L1 cache➜ 16kB cache with 32B lines➜ 4kB (base) page size

013 539

s b

index (8 bits)

035 12

PFN offset

tag (28 bits)

VA

PA

• Remember, location of data in cache determined by index

• Tag only confirms whether it’s a hit!

• Synonym problem iff VA12 6= VA′12

cse/UNSW COMP9242 2002/S2 W3 P12

Page 26: Caching Main Memory - UNSW Engineering

SYNONYMS...

• Avoiding synonym problems:? hardware synonym detection

cse/UNSW COMP9242 2002/S2 W3 P13

Page 27: Caching Main Memory - UNSW Engineering

SYNONYMS...

• Avoiding synonym problems:? hardware synonym detection? flush cache on context switch

➜ doesn’t help for aliasing within address space

cse/UNSW COMP9242 2002/S2 W3 P13

Page 28: Caching Main Memory - UNSW Engineering

SYNONYMS...

• Avoiding synonym problems:? hardware synonym detection? flush cache on context switch

➜ doesn’t help for aliasing within address space? detect synonyms and ensure

➜ all read-only, OR➜ only one synonym mapped

? restrict VM mapping so synonyms map to same cache set

cse/UNSW COMP9242 2002/S2 W3 P13

Page 29: Caching Main Memory - UNSW Engineering

SYNONYMS...

• Avoiding synonym problems:? hardware synonym detection? flush cache on context switch

➜ doesn’t help for aliasing within address space? detect synonyms and ensure

➜ all read-only, OR➜ only one synonym mapped

? restrict VM mapping so synonyms map to same cache set➜ e.g., R4x00: ensure that VA12 = PA12

cse/UNSW COMP9242 2002/S2 W3 P13

Page 30: Caching Main Memory - UNSW Engineering

Fully Virtual Caches

• Fastest (don’t rely on TLB for retrieving data)➜ however, needs TLB for protection➜ or other mechanism to provide protection

cse/UNSW COMP9242 2002/S2 W3 P14

Page 31: Caching Main Memory - UNSW Engineering

Fully Virtual Caches

• Fastest (don’t rely on TLB for retrieving data)➜ however, needs TLB for protection➜ or other mechanism to provide protection

• Suffer from synonyms and homonyms➜ requires flushing on context switch

➜ makes context switches expensive➜ may even be required on kernel→user switch

➜ ... or guarantee of no synonyms and homonyms

cse/UNSW COMP9242 2002/S2 W3 P14

Page 32: Caching Main Memory - UNSW Engineering

Fully Virtual Caches

• Fastest (don’t rely on TLB for retrieving data)➜ however, needs TLB for protection➜ or other mechanism to provide protection

• Suffer from synonyms and homonyms➜ requires flushing on context switch

➜ makes context switches expensive➜ may even be required on kernel→user switch

➜ ... or guarantee of no synonyms and homonyms• Require TLB lookup for write-back!

cse/UNSW COMP9242 2002/S2 W3 P14

Page 33: Caching Main Memory - UNSW Engineering

Fully Virtual Caches

• Fastest (don’t rely on TLB for retrieving data)➜ however, needs TLB for protection➜ or other mechanism to provide protection

• Suffer from synonyms and homonyms➜ requires flushing on context switch

➜ makes context switches expensive➜ may even be required on kernel→user switch

➜ ... or guarantee of no synonyms and homonyms• Require TLB lookup for write-back!

• Used on MC68040, i860

cse/UNSW COMP9242 2002/S2 W3 P14

Page 34: Caching Main Memory - UNSW Engineering

Fully virtual caches with keys

• Add address-space identifier (ASID) as part of tag.

• On access compare with CPU’s ASID register.

cse/UNSW COMP9242 2002/S2 W3 P15

Page 35: Caching Main Memory - UNSW Engineering

Fully virtual caches with keys

• Add address-space identifier (ASID) as part of tag.

• On access compare with CPU’s ASID register.

• Turns homonyms into synonyms➜ Removes homonym problem➜ Potentially better context switching performance➜ ASID recycling still requires cache flush

cse/UNSW COMP9242 2002/S2 W3 P15

Page 36: Caching Main Memory - UNSW Engineering

Fully virtual caches with keys

• Add address-space identifier (ASID) as part of tag.

• On access compare with CPU’s ASID register.

• Turns homonyms into synonyms➜ Removes homonym problem➜ Potentially better context switching performance➜ ASID recycling still requires cache flush

• Doesn’t solve synonym problem

• Doesn’t solve write-back problem

cse/UNSW COMP9242 2002/S2 W3 P15

Page 37: Caching Main Memory - UNSW Engineering

Virtually indexed, physically tagged caches

• Medium speed:➜ lookup in parallel with address translation➜ tag comparison after address translation

cse/UNSW COMP9242 2002/S2 W3 P16

Page 38: Caching Main Memory - UNSW Engineering

Virtually indexed, physically tagged caches

• Medium speed:➜ lookup in parallel with address translation➜ tag comparison after address translation

• No homonym problem

• Potential synonym problem

cse/UNSW COMP9242 2002/S2 W3 P16

Page 39: Caching Main Memory - UNSW Engineering

Virtually indexed, physically tagged caches

• Medium speed:➜ lookup in parallel with address translation➜ tag comparison after address translation

• No homonym problem

• Potential synonym problem

• More bits per tag (cannot leave off set-number bits)➜ increases area, latency, power consumption

cse/UNSW COMP9242 2002/S2 W3 P16

Page 40: Caching Main Memory - UNSW Engineering

Virtually indexed, physically tagged caches

• Medium speed:➜ lookup in parallel with address translation➜ tag comparison after address translation

• No homonym problem

• Potential synonym problem

• More bits per tag (cannot leave off set-number bits)➜ increases area, latency, power consumption

• Used on most modern architectures for L1 cache

cse/UNSW COMP9242 2002/S2 W3 P16

Page 41: Caching Main Memory - UNSW Engineering

Physical Caches

• No synonym problem

• No homonym problem

⇒ Easy to manage

cse/UNSW COMP9242 2002/S2 W3 P17

Page 42: Caching Main Memory - UNSW Engineering

Physical Caches

• No synonym problem

• No homonym problem

⇒ Easy to manage

• If small or highly associative (all index bits come from page offset)indexing can be in parallel with address translation.➜ Attractive feature for intermediate level cache.

cse/UNSW COMP9242 2002/S2 W3 P17

Page 43: Caching Main Memory - UNSW Engineering

Physical Caches

• No synonym problem

• No homonym problem

⇒ Easy to manage

• If small or highly associative (all index bits come from page offset)indexing can be in parallel with address translation.➜ Attractive feature for intermediate level cache.

• Cache can use bus snooping to receive/supply DMA data

• Usable as off-chip cache with any architecture.

cse/UNSW COMP9242 2002/S2 W3 P17

Page 44: Caching Main Memory - UNSW Engineering

Cache Hierarchy

• Use a hierarchy of caches to balance memory accesses:➜ Small, fast, virtually indexed cache on top (L1 cache).➜ Large, slow, physically indexed cache at bottom (L2–L5).

• Each level reduces and clusters traffic.

• High levels tend to be separated into instruction and data caches.

• Low levels tend to be unified.

cse/UNSW COMP9242 2002/S2 W3 P18

Page 45: Caching Main Memory - UNSW Engineering

Translation Lookaside Buffers

• TLB is a cache for page table entries.

• Can be:➜ hardware loaded, transparent to OS, or➜ software loaded, maintained by OS.

cse/UNSW COMP9242 2002/S2 W3 P19

Page 46: Caching Main Memory - UNSW Engineering

Translation Lookaside Buffers

• TLB is a cache for page table entries.

• Can be:➜ hardware loaded, transparent to OS, or➜ software loaded, maintained by OS.

• Can be:➜ split, instruction and data TLBs, or➜ unified.

cse/UNSW COMP9242 2002/S2 W3 P19

Page 47: Caching Main Memory - UNSW Engineering

Translation Lookaside Buffers

• TLB is a cache for page table entries.

• Can be:➜ hardware loaded, transparent to OS, or➜ software loaded, maintained by OS.

• Can be:➜ split, instruction and data TLBs, or➜ unified.

• Some architectures (MIPS, IA-64) use a hierarchy of TLBs:➜ Top-level TLB is hardware-loaded from lower levels.➜ Transparent to OS.

cse/UNSW COMP9242 2002/S2 W3 P19

Page 48: Caching Main Memory - UNSW Engineering

TLB Issues

ASSOCIATIVITY

• First TLB (VAX-11/780, [CE85]) was 2-way associative.

• Most modern architectures have fully associative TLBs.

cse/UNSW COMP9242 2002/S2 W3 P20

Page 49: Caching Main Memory - UNSW Engineering

TLB Issues

ASSOCIATIVITY

• First TLB (VAX-11/780, [CE85]) was 2-way associative.

• Most modern architectures have fully associative TLBs.

• Exceptions:➜ i486 (4-way),➜ Pentium, Pentium-6 (4-way),➜ IBM RS/6000 (2-way).

cse/UNSW COMP9242 2002/S2 W3 P20

Page 50: Caching Main Memory - UNSW Engineering

TLB Issues

ASSOCIATIVITY

• First TLB (VAX-11/780, [CE85]) was 2-way associative.

• Most modern architectures have fully associative TLBs.

• Exceptions:➜ i486 (4-way),➜ Pentium, Pentium-6 (4-way),➜ IBM RS/6000 (2-way).

• Superpages⇒ fully associative TLB

cse/UNSW COMP9242 2002/S2 W3 P20

Page 51: Caching Main Memory - UNSW Engineering

SIZE (I-TLB + D-TLB)

Architecture TLB SizeVAX 64–256ix86 32–32+64MIPS 96–128SPARC 64Alpha 32–128+128RS/6000 32+128PA-8000 96+96Itanium 64+96

Note : not much growth in 20 years!

cse/UNSW COMP9242 2002/S2 W3 P21

Page 52: Caching Main Memory - UNSW Engineering

TLB COVERAGE

• Memory sizes are increasing.

• Number of TLB entries are more-or-less constant.

• Page sizes are growing very slowly.

cse/UNSW COMP9242 2002/S2 W3 P22

Page 53: Caching Main Memory - UNSW Engineering

TLB COVERAGE

• Memory sizes are increasing.

• Number of TLB entries are more-or-less constant.

• Page sizes are growing very slowly.

⇒ Total amount of RAM mapped by TLB is not changing much.

⇒ Fraction of RAM mapped by TLB is shrinking dramatically.

cse/UNSW COMP9242 2002/S2 W3 P22

Page 54: Caching Main Memory - UNSW Engineering

TLB COVERAGE

• Memory sizes are increasing.

• Number of TLB entries are more-or-less constant.

• Page sizes are growing very slowly.

⇒ Total amount of RAM mapped by TLB is not changing much.

⇒ Fraction of RAM mapped by TLB is shrinking dramatically.

• Modern architectures have very low TLB coverage.

• Also, many modern architectures have software-loaded TLBs.

⇒ General increase in TLB miss handling cost.

cse/UNSW COMP9242 2002/S2 W3 P22

Page 55: Caching Main Memory - UNSW Engineering

Address Space Usage vs. TLB Coverage

• Each TLB entry maps one virtual page.

• On TLB miss, reloaded from page table (PT), which is in memory.

⇒ Some TLB entries need to map page table.➜ E.g. 32-bit page table entries, 4kb pages.➜ One PT page maps 4Mb.

cse/UNSW COMP9242 2002/S2 W3 P23

Page 56: Caching Main Memory - UNSW Engineering

Address Space Usage vs. TLB Coverage

• Each TLB entry maps one virtual page.

• On TLB miss, reloaded from page table (PT), which is in memory.

⇒ Some TLB entries need to map page table.➜ E.g. 32-bit page table entries, 4kb pages.➜ One PT page maps 4Mb.

• Traditional UNIX process has 2 regions of allocated virtual addressspace:➜ low end: text, data, heap,➜ high end: stack.⇒ 2–3 PT pages are sufficient to map most address spaces.

cse/UNSW COMP9242 2002/S2 W3 P23

Page 57: Caching Main Memory - UNSW Engineering

SPARSE ADDRESS SPACE USE TIES UP PT ENTRIES

msg msgmsgmsgtext stackdata bss

page table

TLB

text bss stackdata

UNIX virtual address space

Sparse virtual address space

cse/UNSW COMP9242 2002/S2 W3 P24

Page 58: Caching Main Memory - UNSW Engineering

Origins of sparse address-space use

• Modern OS features:➜ memory-mapped files,➜ dynamically-linked libraries,➜ mapping IPC (server-based systems)...

cse/UNSW COMP9242 2002/S2 W3 P25

Page 59: Caching Main Memory - UNSW Engineering

Origins of sparse address-space use

• Modern OS features:➜ memory-mapped files,➜ dynamically-linked libraries,➜ mapping IPC (server-based systems)...

• This problem gets worse 64-bit address spaces:➜ bigger page tables.

• An in-depth study of such effects can be foundin [UNS+94].

cse/UNSW COMP9242 2002/S2 W3 P25

Page 60: Caching Main Memory - UNSW Engineering

References

[CE85] Douglas W. Clark and Joel S. Emer. Performance of theVAX-11/780 translation buffer: Simulation andmeasurement. Trans. Comp. Syst., 3:31–62, 1985.

[Sch94] Curt Schimmel. UNIX Systems for Modern Architectures.Addison Wesley, 1994.

[UNS+94] Richard Uhlig, David Nagle, Tim Stanley, Trevor Mudge,Stuart Sechrest, and Richard Brown. Design tradeoffs forsoftware-managed TLBs. Trans. Comp. Syst., pages175–205, 1994.

cse/UNSW COMP9242 2002/S2 W3 P26