26
Caching II Andreas Klappenecker CPSC321 Computer Architecture

Caching II Andreas Klappenecker CPSC321 Computer Architecture

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Caching II

Andreas KlappeneckerCPSC321 Computer

Architecture

Page 2: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Verilog Questions & Answers

Page 3: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Verilog Q & A How is the xor instruction encoded?

R-format instruction, function field Ox26 See [PH] page A-59

What is the purpose of Idealmem.v? It models the memory dmeminit.v initializes data memory imeminit.v initializes instruction memory

Page 4: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Verilog Q&A How do I specify delays?

`define DEL 10

begin

a <= #(`DEL) b;

c <= #(`DEL) d;

end

Delays can be inserted anywhere in an assignment

Page 5: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Delaysmodule iab;

integer i, j;

initial begin

i = 3;

j = 4;

begin

#1 i = #1 j;

#1 j = #1 i;

end

end

endmodule

module iab;

integer i, j;

initial begin

i = 3;

j = 4;

begin

#1 i = #1 j;

#1 j = #1 i;

end

end

endmodule

Simulation starts:

@time 0: i=3, j=4

Simulation continues until first delay #1 and waits until time 1.

@time 1, j is sampled

@time 2, assign 4 to i

continue w/ next stmt

@time 3, i is sampled

@time 4, assign 4 to j

Page 6: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Delaysmodule ianb;

integer i, j;

initial begin

i = 3;

j = 4;

begin

i <= #1 j;

j <= #1 i;

end

end

endmodule

module ianb;

integer i, j;

initial begin

i = 3;

j = 4;

begin

i <= #1 j;

j <= #1 i;

end

end

endmodule

@time 0: i=3, j=4

both non-blocking assignments finish at time 0

[intra-assignments delays do not delay the execution of the statement]

sample j and schedule to assign to i at time 1

sample i and schedule to assign to j

@time 1: i = 4, j = 3

Page 7: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Delays

Hint: Using unit delays simplifies debugging

It allows you to find out which signal depends on which

Do not code in the form #1, rather use

define ‘foo_del 1 // Change later a <= #(‘foo_del) b;

Page 8: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Clock

module m555 (CLK);

parameter STime = 0,Ton = 50,Toff = 50,Tcc=Ton+Toff;

output CLK;

reg CLK;

initial begin

#STime CLK = 0;

end

always begin

#Toff CLK = ~CLK;

#Ton CLK = ~CLK;

end

endmodule

Page 9: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Project For jal and jr, the datapath of the

book is not enough You need more control signals for

ALUop, so there is no point to stick to the way it is done in the book

Page 10: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Report

Include some a table explaining yourcontrol signals, e.g.,

Page 11: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Caching

Page 12: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Memory Users want large and fast memories

SRAM is too expensive for main memory DRAM is too slow for many purposes Compromised: Build a memory hierarchy

CPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

Page 13: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Locality

Temporal locality A referenced item will be again

referenced soon Spatial locality

nearby data will be referenced soon

Page 14: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Mapping: address modulo the number of blocks in the cache, x -> x mod B

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

Page 15: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Cache with 1024=210 words tag from cache is compared against upper portion of

the address If tag=upper 20 bits and valid bit is set, then we

have a cache hit otherwise it is a cache miss

What kind of locality are we taking advantage of?

Direct Mapped Cache

Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

Page 16: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Taking advantage of spatial locality:

Direct Mapped Cache

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

Page 17: Caching II Andreas Klappenecker CPSC321 Computer Architecture

Read hits this is what we want!

Read misses stall the CPU, fetch block from memory, deliver to cache,

restart Write hits:

can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later)

Write misses: read the entire block into the cache, then write the word

Cache Hits and Misses

Page 18: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 19: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 20: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 21: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 22: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 23: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 24: Caching II Andreas Klappenecker CPSC321 Computer Architecture
Page 25: Caching II Andreas Klappenecker CPSC321 Computer Architecture

What Block Size?

A large block size reduces cache misses Cache miss penalty increases We need to balance these two

constraints Next time:

How can we measure cache performance? How can we improve cache performance?

Page 26: Caching II Andreas Klappenecker CPSC321 Computer Architecture