14
1 CMPE 421 Advanced Computer Architecture Caching with Associativity PART2

CMPE 421 Advanced Computer Architecture

Embed Size (px)

DESCRIPTION

CMPE 421 Advanced Computer Architecture. Caching with Associativity PART2. V. Tag. Data. V. Tag. Data. 0:. 1:. 2. 3:. 4:. 5:. 6:. 7:. 8. 9:. 10:. 11:. 12:. 13:. 14:. 15:. Other Cache organizations. Fully Associative. Direct Mapped. Index. No Index. - PowerPoint PPT Presentation

Citation preview

1

CMPE 421 Advanced Computer Architecture

Caching with AssociativityPART2

2

Other Cache organizations

Direct MappedDirect Mapped

0:1:23:4:5:6:7:89:

10:11:12:13:14:15:

V Tag DataIndexIndex

Address = Tag | Index | Block offset

Fully AssociativeFully Associative

No IndexNo Index

Address = Tag | Block offset

Each address has only one possible location

Each address has only one possible location

Tag DataV

3

Fully Associative Cache

4

A Compromise

2-Way set associative2-Way set associative

Address = Tag | Index | Block offset

4-Way set associative4-Way set associative

Address = Tag | Index | Block offset

0:

1:

2:

3:

4:

5:

6:

7:

V Tag Data

Each address has two possiblelocations with the same index

Each address has two possiblelocations with the same index

One fewer index bit: 1/2 the indexes

One fewer index bit: 1/2 the indexes

0:

1:

2:

3:

V Tag Data

Each address has four possiblelocations with the same index

Each address has four possiblelocations with the same index

Two fewer index bits: 1/4 the indexes

Two fewer index bits: 1/4 the indexes

5

Range of Set Associative Caches

Block offset Byte offsetIndexTag

Decreasing associativity

Fully associative(only one set)Tag is all the bits exceptblock and byte offset

Direct mapped(only one way)Smaller tags

Increasing associativity

Selects the setUsed for tag compare Selects the word in the block

6

Set Associative Cache

0

Cache

Main Memory

Q1: How do we find it?

Use next 1 low order memory address bit to determine which cache set (i.e., modulo the number of sets in the cache)

Tag Data

Q2: Is it there?

Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache

V

0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx

Two low order bits define the byte in the word (32-b words)One word blocks

Set

1

01

Way

0

1

(block address) modulo (# set in the cache)

7

Set Associative Cache Organization

FIGURE 7.17 The implementation of a four-way set-associative cache requires four comparators and a 4-to-1 multiplexor. The comparators determine which element of the selected set (if any) matches the tag. The output of the comparators is used to select the data from one of the four blocks of the indexed set, using a multiplexor with a decoded select signal. In some implementations, the Output enable signals on the data portions of the cache RAMs can be used to select the entry in the set that drives the output. The Output enable signal comes from the comparators, causing the element that matches to drive the data outputs.

8

Remember the Example for Direct Mapping (ping pong effect)

0 4 0 4

0 4 0 4

Consider the main memory word reference string 0 4 0 4 0 4 0 4

miss miss miss miss

miss miss miss miss

00 Mem(0) 00 Mem(0)01 4

01 Mem(4)000

00 Mem(0)01

4

00 Mem(0)01 4

00 Mem(0)01

401 Mem(4)

00001 Mem(4)

000

Start with an empty cache - all blocks initially marked as not valid

Ping pong effect due to conflict misses - two memory locations that map into the same cache block

8 requests, 8 misses

9

Solution: Use set associative cache

0 4 0 4

Consider the main memory word reference string 0 4 0 4 0 4 0 4

miss miss hit hit

000 Mem(0) 000 Mem(0)

Start with an empty cache - all blocks initially marked as not valid

010 Mem(4) 010 Mem(4)

000 Mem(0) 000 Mem(0)

010 Mem(4)

Solves the ping pong effect in a direct mapped cache due to conflict misses since now two memory locations that map into the same cache set can co-exist!

8 requests, 2 misses

10

Set Associative Example

V Tag DataIndex00000000

000:

001:

010:

011:

100:

101:

110:

111:

01001110001100110100010011110001101100001100111000

MissMissMissMissMiss

Index V Tag Data

0

0000000

00:

01:

10:

11:

V Tag DataIndex0

0000000

0:

1:

Direct-Mapped 2-Way Set Assoc. 4-Way Set Assoc.

01001110001100110100010011110001101100001100111000

MissMissHitMissMiss

01001110001100110100010011110001101100001100111000

MissMissHitMissHit

Byte offset (2 bits)Block offset (2 bits)Index (1-3 bits)Tag (3-5 bits)

010 -1 110010

0100 -

1 1100 -1

011110

01101100

1 01001

1 11001

1 01101

-

--

11

New Performance Numbers

Miss rates for DEC 3100 (MIPS machine)

spice Direct 0.3% 0.6% 0.4%

gcc Direct 2.0% 1.7% 1.9%

spice 2-way 0.3% 0.6% 0.4%

gcc 4-way 1.6% 1.4% 1.5%

Benchmark Associativity Instruction Data miss Combinedrate miss rate

Separate 64KB Instruction/Data Caches

gcc 2-way 1.6% 1.4% 1.5%

spice 4-way 0.3% 0.6% 0.4%

12

Benefits of Set Associative Caches The choice of direct mapped or set associative depends

on the cost of a miss versus the cost of implementation

0

2

4

6

8

10

12

1-way 2-way 4-way 8-way

Associativity

Mis

s R

ate

4KB8KB16KB32KB64KB128KB256KB512KB

Data from Hennessy & Patterson, Computer Architecture, 2003

Largest gains are in going from direct mapped to 2-way (20%+ reduction in miss rate)

Virtual Memory (32-bit system): 8KB page size,16MB Mem

Phys. Page # Disk AddressVirt.Pg.# V

012

512K

...

...

0121331

Index1319

Virtual AddressVirtual Address

Page offset

0121323Physical AddressPhysical Address

4GB / 8KB =512K entries

219=512K

11

Virtual memory example

Virtual Page # Valid Physical Page #/(index) Bit Disk address000000 1 1001000001 0 sector 5000...000010 1 0010000011 0 sector 4323…000100 1 1011000101 1 1010000110 0 sector 1239...000111 1 0001

Page Table:

System with 20-bit V.A., 16KB pages, 256KB of physical memory

Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.

Access to:0000 1000 1100 1010 1010

PPN = 0010

Physical Address: 00 1000 1100 1010 1010

Access to:0001 1001 0011 1100 0000

PPN = Page Fault tosector 1239...

Pick a page to “kick out” of memory (use LRU).

Assume LRU is VPN 000101 for this example.

01 1010

sector xxxx...

Read data from sector 1239into PPN 1010