18
1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 20 Instructor: L.N. Bhuyan www. cs . ucr . edu /~ bhuyan

CS 161 Ch 7: Memory Hierarchy LECTURE 20

Embed Size (px)

DESCRIPTION

CS 161 Ch 7: Memory Hierarchy LECTURE 20. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan. Cache Organization. (1) How do you know if something is in the cache? (2) If it is in the cache, how to find it?. Answer to (1) and (2) depends on type or organization of the cache - PowerPoint PPT Presentation

Citation preview

Page 1: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

1 1999 ©UCB

CS 161Ch 7: Memory Hierarchy

LECTURE 20

Instructor: L.N. Bhuyanwww.cs.ucr.edu/~bhuyan

Page 2: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

2 1999 ©UCB

Cache Organization

(1) How do you know if something is in the cache?

(2) If it is in the cache, how to find it?

°Answer to (1) and (2) depends on type or organization of the cache

° In a direct mapped cache, each memory address is associated with one possible block within the cache• Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

Page 3: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

3 1999 ©UCB

Simplest Cache: Direct Mapped

Memory4-Block Direct Mapped CacheBlock

Address

0123456789

101112131415

Cache Index

0123

• Cache Block 0 can be occupied by data from:Memory block 0, 4, 8, 12

- Cache Block 1 can be occupied by data from:

Memory block 1, 5, 9, 13

0000two

0100two

1000two

1100two

• Block Size = 32/64 Bytes

Page 4: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

4 1999 ©UCB

Simplest Cache: Direct Mapped

MainMemory

4-Block Direct Mapped CacheBlock

Address

0123456789

101112131415

Cache Index

0123

0010

0110

1010

1110

° index determines block in cache

° index = (address) mod (# blocks)

° If number of cache blocks is power of 2, then cache index is just the lower n bits of memory address [ n = log2(# blocks) ]

tag index

Memory block address

Page 5: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

5 1999 ©UCB

Simplest Cache: Direct Mapped w/Tag

MainMemory

Direct Mapped CacheBlockAddress

0123456789

101112131415

0123

0010

0110

1010

1110

° tag determines which memory block occupies cache block

° tag bits = lefthand bits of address

° hit: cache tag field = tag bits of address

° miss: tag field tag bits of addr.

tag

11

datacacheindex

Page 6: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

6 1999 ©UCB

Accessing data in a direct mapped cache

Three types of events:

° cache miss: nothing in cache in appropriate block, so fetch from memory

° cache hit: cache block is valid and contains proper address, so read desired word

° cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory

Cache Access Procedure: (1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/byte.

Page 7: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

7 1999 ©UCB

Data valid, tag OK, so read offset return word d

...

ValidTag 0x0-3 0x4-7 0x8-b 0xc-f

01234567

10221023

...

1 0 a b c d

° 000000000000000000 0000000001 1100

Index0

000000

00

12

3

Page 8: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

8 1999 ©UCB

An Example Cache: DecStation 3100°Commercial Workstation: ~1985

°MIPS R2000 Processor (similar to pipelined machine of chapter 6)

°Separate instruction and data caches:• direct mapped

• 64K Bytes (16K words) each

• Block Size: 1 Word (Low Spatial Locality)

Solution: Increase block size – 2nd example

Page 9: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

9 1999 ©UCB

16 14

Valid Tag Data

Hit

16 32

16Kentries

16 bits 32 bits

31 30 17 16 15 5 4 3 2 1 0

ByteOffset

Data

Address (showing bit positions)

DecStation 3100 Cache

If miss, cache controller stalls the processor,

loads data from main memory

Page 10: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

10 1999 ©UCB

16 12 Byteoffset

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

Address (showing bit positions)31 . . . 16 15 . . 4 3 2 1 0

64KB Cache with 4-word (16-byte) blocks

Tag DataV

Page 11: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

11 1999 ©UCB

Miss rates: 1-word vs. 4-word block (cache similar to DecStation 3100)

I-cache D-cache CombinedProgram miss rate miss rate miss rate

gcc 6.1% 2.1% 5.4%spice1.2% 1.3% 1.2%

gcc 2.0% 1.7% 1.9%spice0.3% 0.6% 0.4%

1-wordblock

4-wordblock

Page 12: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

12 1999 ©UCB

Miss Rate Versus Block Size

256

40%

35%

30%

25%

20%

15%

10%

5%

0%

Mis

s ra

te

64164

Block size (bytes) 1 KB

8 KB

16 KB

64 KB

256 KB

totalcachesize• Figure 7.12 -

for direct mapped cache

Page 13: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

13 1999 ©UCB

Extreme Example: 1-block cache°Suppose choose block size = cache size? Then only one block in the cache

°Temporal Locality says if an item is accessed, it is likely to be accessed again soon

• But it is unlikely that it will be accessed again immediately!!!

• The next access is likely to be a miss- Continually loading data into the cache but

forced to discard them before they are used again

- Worst nightmare of a cache designer: Ping Pong Effect

Page 14: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

14 1999 ©UCB

Block Size and Miss Penality°With increase in block size, the cost of a miss also increases

°Miss penalty: time to fetch the block from the next lower level of the hierarchy and load it into the cache

°With very large blocks, increase in miss penalty overwhelms decrease in miss rate

°Can minimize average access time if design memory system right

Page 15: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

15 1999 ©UCB

Block Size Tradeoff

MissPenalty

Block Size

AverageAccess

Time

Increased Miss Penalty& Miss Rate

Block Size

MissRate Exploits Spatial Locality

Fewer blocks: compromisestemporal locality

Block Size

Page 16: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

16 1999 ©UCB

Direct-mapped Cache Contd.

°The direct mapped cache is simple to design and its access time is fast (Why?)

°Good for L1 (on-chip cache)

°Problem: Conflict Miss, so low hit ratio

Conflict Misses are misses caused by accessing different memory locations that are mapped to the same cache index

In direct mapped cache, no flexibility in where memory block can be placed in cache, contributing to conflict misses

Page 17: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

17 1999 ©UCB

Another Extreme: Fully Associative°Fully Associative Cache (8 word block)• Omit cache index; place item in any block!

• Compare all Cache Tags in parallel

°By definition: Conflict Misses = 0 for a fully associative cache

Byte Offset

:

Cache DataB 0

0431

:

Cache Tag (27 bits long)

Valid

:

B 1B 31 :

Cache Tag=

==

=

=:

Page 18: CS 161 Ch 7: Memory Hierarchy  LECTURE 20

18 1999 ©UCB

Fully Associative Cache°Must search all tags in cache, as item can be in any cache block

°Search for tag must be done by hardware in parallel (other searches too slow)

°But, the necessary parallel comparator hardware is very expensive

°Therefore, fully associative placement practical only for a very small cache