35
Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE [email protected] Graduate Student, Southern Methodist University.

Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Embed Size (px)

DESCRIPTION

Cache Basics Cache is a small store placed between a processor and its main memory in a shared memory system Faster Volatile store Exploits locality of reference. Spatial locality: Neighboring locations in a store have a higher chance of being accessed. Temporal locality: Once accessed, a location in a store will be accessed repeatedly over time. Hit: An event when data to be read is already in the cache. Large number of hits give better throughput.

Citation preview

Page 1: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Understanding and Implementing Cache Coherency Policies

CSE 8380: Parallel and Distributed Processing

Dr. Hesham El-Rewini

Presented by,Fazela Vohra

CSE [email protected] Student,

Southern Methodist University.

Page 2: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Goals• Create a pure software cache system

as a test bed.• Implement five cache write policies

for maintaining coherency on the test bed.

• Perform experiments and test different scenarios

• Gather statistics, measure and make conclusions.

Page 3: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Cache Basics• Cache is a small store placed between a

processor and its main memory in a shared memory system

• Faster Volatile store• Exploits locality of reference.• Spatial locality: Neighboring locations in a

store have a higher chance of being accessed. • Temporal locality: Once accessed, a location in

a store will be accessed repeatedly over time. • Hit: An event when data to be read is already in

the cache.• Large number of hits give better throughput.

Page 4: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Issues• Multiple copies of a datum exist.

• Keeping copies of cached items in sync

• Sync’ing should not affect performance or throughput of the system.

Page 5: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Project Details• Implement various cache policies.• Tinker with tunables to understand

effects on the system.• Measure performance/effectiveness

of the policies NOT the algorithms or implementation.

• Software written in C on Windows Operating System.

Page 6: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Model of the System

Input:

I/O Load

Policy Parameter

Diagnostic Output:

Cache and Main Mem Dumps

Policies

Main Memor

y

Caches

Processing Units

Page 7: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Inputs and Outputs• The input is given through a file which

contains:– I/O type (0=Read, 1=Write)– I/O address.– Processor to perform I/O on.– The data to be written for the basic system

where no computations are performed• A parser converts input to actual I/O.• Policies can be specified by the user.• Observe dumps of cache/main memory

to verify functionality.

Page 8: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Assumptions and Simplifications

• Inputs are small sequences of reads and writes.

• Use small caches to create maximum activity.

• Memory and cache locations are byte wide.• All caches have the same write policy

configured at any point in time.• Each cache entry has the following

structure: DATAADDRSTATUS

Page 9: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Policies Implemented

• Write Through – Write Invalidate• Write Back – Write Invalidate• Write Once• Write Update – Partial Write

Through• Write Back – Write Update

Page 10: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Policy 1: WRITE THROUGH WRITE INVALIDATE

STATES

VALIDCopy consistent

with main memory

INVALIDCopy inconsistent with main memory

Page 11: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

READ WRITE

Policy 1: WRITE THROUGH WRITE INVALIDATE

HIT MISS HIT MISS

Read the copy found in cache.Done!

Any other cache has a valid copy

No other cache has. Go to global memory

Replacement is required if no space to accommodate incoming new copy. Since cache is always consistent with main memory. No write back is required.

STATUS=VALID

Write over the copy found in cache.

Update global memory and invalidate otherCaches.

STATUS=VALID

Any other cache has a valid copy

No other cache has. Go to global memory

Write new data over this copy. Update global memory. Invalidate others. Replacement may be needed if no space. No write back.

STATUS=VALID

Page 12: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

ResultsWrite Through - Write Invalidate Cache size vs. number of Memory Accesses

6760

5544

27 27

01020304050607080

0 50 100 150 200 250

Cache Size (% of Main Memory)

No. o

f Mem

ory A

cces

esWrite Through-Write Invalidate

Hit rate vs Size of Cache

8

1214

17

20 20

0

5

10

15

20

25

0 50 100 150 200 250

Cache Size (% of Main Memory)

Hits

Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.

Page 13: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Policy 2: WRITE BACK WRITE INVALIDATE

STATES

RO-SHAREDMultiple copies

consistent with main memory

INVALID

Copy inconsistent with main memory

RW-EXCLUSIVEOnly one copy inconsistent

with main memory(Ownership)

Page 14: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

READ WRITE

Policy 2: WRITE BACK WRITE INVALIDATE

HIT MISS HIT

MISSRead the copy found in cache.Done!

RW copy in no other cache. Get a copy from global

RW copy in another cacheGet it.Update global memory.

If Status=RWWrite over it.

STATUS=RW

Other has RWNo other cache has RW. Go to global memory. Write new data. Invalidate others

STATUS=RWIf entry to be replaced=RW, write back to global.If entry to be replaced=I/RONo write back.

STATUS=ROIn both.

Write over it.

STATUS=RO

In both caches if got from another cache.

SPACE??n y

If Status=ROWrite over it. Invalidate others.STATUS=RW

Copy into own. Invalidate others.Write new data.STATUS=RWSPACE?

?If no space, Replace. If copy to be replaced = RW, write back to global. Otherwise simply write over it. No write back.

STATUS=RW

Page 15: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Write Back - Write Invalidate Cache size vs. number of Memory Accesses

6255

4635

18 18

0

20

40

60

80

0 50 100 150 200 250

Cache Size (% of Main Memory)

No. o

f Mem

ory A

cces

es

c

Write Back-Write InvalidateHit rate vs Size of Cache

8

1214

17

20 20

0

5

10

15

20

25

0 50 100 150 200 250

Cache Size (% of Main Memory)

Hits

Results

Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.

Page 16: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Policy 3: WRITE ONCE

STATES

RESERVEDWritten onceconsistent with mainmemory

VALIDCopy

Consistent with

main memory

DIRTYWritten more

than once. InconsistentWith mainmemory

INVALIDCopy

Consistent with

main memory

Page 17: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

READ WRITE

Policy 3: WRITE ONCE

HIT MISS HIT

MISSRead the copy found in cache.Done!

DIRTY copy in no other cache. Get a copy from global

DIRTY copy in another cacheGet it.Update global memory.

If Status=D/RESWrite over it.

STATUS=DOther has DIRTY

No other cache has DIRTY. Go to global memory. Write new data. Invalidate others

If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/RESNo write back.STATUS=VALIDIn both.

Write over it.

STATUS=VALIDIn both

caches if got from another

cache.

SPACE??n y

If Status=VALIDWrite over it. Invalidate others.Update globalSTATUS=RES

Copy into own. Invalidate others.Write new data.

SPACE??

If no space, Replace. If copy to be replaced = DIRTY, write back to global. Otherwise simply write over it. No write back.

STATUS=DIRTY

Page 18: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Write OnceHit rate vs Size of Cache

8

1214

17

20 20

0

5

10

15

20

25

0 50 100 150 200 250

Cache Size (% of Main Memory)

Hits

Write Once Cache size vs. number of Memory Accesses

696560 58 55 55

0

20

40

60

80

0 50 100 150 200 250

Cache Size (% of Main Memory)

No. o

f Mem

ory A

cces

es

c

Results

Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.

Page 19: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Policy 4: WRITE UPDATE PARTIAL WRITE THROUGH

STATES

SHAREDMultiple copies

consistent with main memory

DIRTYOnly one copy inconsistent

with main memory(Ownership)

VALID-EXCLUSIVEOnly one copy

consistent with main memory

Page 20: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

READ

Policy 4: WRITE UPDATE ‘PARTIAL’ WRITE THROUGH

HIT MISS

Read the copy found in cache.Done!

No other cache has a copy. Get a copy from global

DIRTY copy in another cacheGet it. Update global.

If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.

STATUS=VALX

Write over it.

STATUS =VALX

SPACE??n y

SPACE??

VALX/SHARE copy in another cacheGet it.

If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.STATUS=SHAR

EIn both.

Write over it.

STATUS=SHARE

In both caches.

n y

Page 21: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

WRITE

Policy 4: Contd…

HIT MISS

Copy=D/VALXWrite locallyCopy=SHAREWrite overUpdate all sharing caches.Update global.STATUS=SHARE

Another cache has a copy. Get it Write overUpdate all cachesUpdate global

No other cache has a copy. Get it from global memory.Write over it.

SPACE??If entry to be

replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.

STATUS=SHARE

Write over it.

STATUS=SHARE

n y

SPACE??

If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.

STATUS=DIRTY

Write over it.

STATUS=DIRTY

n y

Page 22: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Write Update Partial Write ThroughHit rate vs Size of Cache

9

13

16

20

23 23

0

5

10

15

20

25

0 50 100 150 200 250

Cache Size (% of Main Memory)

Hits

Write Update Partial Write Through Cache size vs. number of Memory Accesses

635652

43

26 26

0

20

40

60

80

0 50 100 150 200 250

Cache Size (% of Main Memory)

No. o

f Mem

ory A

cces

es

c

Results

Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.

Page 23: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Policy 5: WRITE UPDATE WRITE BACK

STATES

SHARED-CLEANMultiple shared Copies, could be Consistent with Main memory.(No ownership)

VALID-EXOnly one copy

Consistent with

main memory

SHARED-DIRTYMultiple shared Copies, last one to be modified

(Ownership)

DIRTYUnshared and

updatedInconsistentWith mainmemory

Page 24: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

READ

Policy 5: WRITE UPDATE WRITE BACK

HIT MISS

Read the copy found in cache.Done!

No other cache has a copy. Get a copy from global

DIRTY/SD copy in another cacheGet it.

If entry to be replaced=D/SD, write back to global.If entry to be replaced=VALX/SCNo write back.

STATUS=VALX

Write over it.

STATUS =VALX

SPACE??n y

SPACE??

VALX/SCcopy in another cacheGet it.

If entry to be replaced=D/SD write back to global.If entry to be replaced=VALX/SCNo write back.Supplying cacheSTATUS=SDTaking cacheSTATUS=SC

Write over it.Supplying cacheSTATUS=SDTaking cacheSTATUS=SC

n

y

SPACE??

If entry to be replaced=D/SD, write back to global.If entry to be replaced=VALX/SCNo write back.

STATUS=SC In both.

Write over it.STATUS=SC

In both caches.

n

y

Page 25: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

WRITE

Policy 5: contd…

HIT MISS

Copy=D/VALXWrite locallySTATUS=DIRTYCopy=SC/SDWrite overUpdate all sharing caches. STATUS (own)=SDSTATUD (others)=SC

Another cache has a copy. Get it Write over.Update all caches

No other cache has a copy. Get it from global memory.Write over it.

SPACE??If entry to be

replaced=D/SD, write back to global.If VALX/SCNo write back.Supplying cacheSTATUS=SCTaking cacheSTATUS=SD

Write over it.

Supplying cacheSTATUS=SC

Taking cacheSTATUS=SD

n y

SPACE??

If entry to be replaced=D/SD, write back to global.If entry to be replaced=VALX/SCNo write back.

STATUS=DIRTY

Write over it.

STATUS=DIRTY

n y

Page 26: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

ResultsWrite Update Write Back

5549

4333

16

0102030405060

0 50 100 150

Cache Size (% Main Memory)

# of m

emor

y acc

esse

sKeep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.

Write Update Write Back

913

1620

23

05

10152025

0 50 100 150

Cache Size (% Main Memory)

Hits

Page 27: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

A Practical Experiment: Matrix Multiplication

• 3 x 3 matrix data from input file to main memory

• Start with empty caches.• Matrices multiplied by reading values from

main memory.• Results written to main memory.• Policy used is Write Through - Write Invalidate• Three processor/cache sets.• Each processor computes three elements of

each row.• Each cache has only 7 locations, 6 inputs and 1

result.• Lot of inter-cache exchange • Replacements abound due to small cache

Page 28: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Logic

222120

121110

020100

bbbbbbbbb

222120

121110

020100

aaaaaaaaa

222120

121110

020100

xxxxxxxxx

=

00a

01a

02a

00b

10b

20b

00x

Processor 0

01b

11b

21b

01x

Processor 0

02b

12b

22b

02x

Processor 0

00a

01a

02a

00a

01a02a

Replace Replace

Page 29: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

00a

01a

02a

00b

10b

20b

00x

Processor 0

01b

11b

21b

01x

02b

12b

22b

02x

00a

01a

02a

00a

01a02a

20a

21a

22a

00b

10b

20b

20x

01b

11b

21b

21x

02b

12b

22b

22x

10a

11a

12a

01b

11b

21b

11x

02b

12b

22b

12x

As it is seen most of the times each processor can

find what it wants in another cache!

Processor 2

Processor 1

10a11a

12a

10a

11a

12a

00b

10b

20b

10x

20a

21a

22a

20a

21a

22a

Page 30: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Replacement Logic • Each entry also carries a Use tag and a

Replaced bit.• When the entry is accessed the Use tag is

incremented.• When the entry is replaced the Replaced

bit is set• So always entries with smaller use tags will

be replaced• The replaced bit takes care that an entry

that has just been replaced is not immediately replaced in the next cycle because it will always have a smaller use tag!

Page 31: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

The Broadcast Issue!

• Shared memory systems interconnected using a BUS, I implemented it as a loop where I invalidate other caches

• Could also do with event based system.• Processor posts an ‘event’ to all caches

when it updates an entry.• Other caches invalidate their entries on

demand based on the events posted.

Page 32: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Future Work

• Implement matrix multiplication for all policies

Page 33: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

References• Advanced Computer Architecture

and Parallel Processing,Hesham El-Rewini, Mostafa Abd-El-Barr

• https://www.cs.tcd.ie/Jeremy.Jones/vivio/vivio.htm

Page 34: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Questions / Answers

Page 35: Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE

Thank You !