View
52
Download
3
Category
Tags:
Preview:
DESCRIPTION
CMPE 421 Parallel Computer Architecture. PART 2 CACHING. Caching Principles. The idea is to use small amount of fast memory near the processor (in a cache) The cache hold frequently needed memory locations - PowerPoint PPT Presentation
Citation preview
CMPE 421Parallel Computer Architecture
PART 2CACHING
2
Caching Principles• The idea is to use small amount of fast memory
near the processor (in a cache) • The cache hold frequently needed memory
locations– When an instruction references a memory locations,
we want that value to be in the cache• For time being, we will focus on a 2 level
hierarchy– Cache (small, fast memory directly connected
processor (upper)– Main memory (large, slow memory at level 2 in the
hierarchy
3
Caching Principles
Processor
Data are transferred
CPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
Block of data(unit of data copy)
-Transfer of data is done between adjacent levels in the hierarchy only- All access by the processor is to the topmost level- Performance depends on hit rates
4
Caching Examples• Principle: Results of operations that are
expensive should be kept around for reuse• Examples:
– CPU caching– Forwarding table caching – File cashing– Web cashing– Query cashing– Computation cashing
5
Cache Levels• Register, a cache on variables • First level cache, a cache on second level
cache• Second level cache, a cache on memory• Memory cache, cache on disk (virtual
mem)• TLB a cache on page table• Branch Prediction a cache on prediction
information?
6
Terminology• Block: The minimum unit of information transferred
between the cache and main memory. Typically measured in bytes or words– Block addressing varies by technology at each level– Blocks are moved one level at a time
• HIT: Data appears in block in upper level when a program needs a particular data object d (blocks) from lower level, it first looks for d in one of the blocks currently stored at upper level. If d happens to be cached at upper level, then we have what is called a cache hit.– For example, a program with good temporal locality might read a
data object from block d, resulting in cache hit from upper level • Remote HTML files stored on WEB servers
Find
ing
the
info
rmat
ion
in
one
of th
e bo
oks
on y
our
desk
7
Terminology• MISS: Data was not in upper level and had
to be fetched from a lower level when there is a miss, the cache at upper level fetches the block containing possibly overwriting an existing block if the upper level is already full.
• HIT RATE: The ratio of hits to memory access found in the upper level– Used as a measure of the performance of
memory hierarchy
8
MISS EXAMPLE
Miss Example: Reading the data object from block 12 in the upper level cache would result in a cache miss because block 12 is not currently stored in the upper level cache once it has been copied from lower level to upper level, block 12 will remain there in expectation of later access
9
Terminology• MISS RATE: The ratio of miss to memory
access found in the upper level• Miss Rate= 1 – Hit rate
• HIT TIME: Time to access to upper level (cache)
hit time=tc= Access Time + Time to determine hit/miss (cache to processor) (Time to find out if it is in the cache)
Ex: The time needed to look through the books on the desk
10
Terminology• MISS PENALTY: The time to replace a block in the
cache with a block from main memory and to deliver the element to the processor
Miss penalty= tc+tm= Lower level access time + Replacement time +Time to deliver to upper level
EX: The time to get another book from the shelves and place it on desk• Miss penalty is usually much larger than the hit time
– Because the upper level is smaller and built using faster memory parts
– Time to examine the books on the desk is much smaller than the time to get up and get a new book from the shelves
EX: HIT_RATIO=0.9MISS_RATIO=1.0- 0.9 =0.1Ideally hit_ratio 1.0, miss_ratio=0.0In practice, hit_ratio<1.0 0.95 or better
Slide #11
Handling a Cache Miss• A cache hit if it happens in 1 cycle has no affect on our
pipeline, but a cache miss does• The action required depends on whether we have an
instruction miss or a data miss• For an instruction miss:
1. Send the original PC value to the memory2. Instruct main memory to perform a read and wait for the memory
to complete access3. Write the result into the appropriate cache entry4. Restart the instruction
• For a data miss:1. Stall the pipeline2. Instruct main memory to perform a read and wait for the memory
to complete access3. Return the result from the memory unit and allow the pipeline to
continue
12
Exploiting Locality• Need to update the contents of the cache to useful stuff
– Leverage locality• Spatial locality
– Rather than fetching just the word that missed– Fetch a block of data around the word that missed
• If you need these words (and you often do) they will now hit
• This is also good since you can build memory systems that deliver large blocks of data once they access it (disk/DRAM)
• Temporal locality• Keep more recently accessed data items closer to the
processor, • so when we need space in the cache, evict the old ones
13
Access Times
• Average Access timeAccess time = [(hit time) (hit rate)] + (miss penalty) (miss rate)
– The hope is that the hit time will be low and the hit rate high since the miss penalty is so much larger than the hit time
• Average Memory Access Time (AMAT)– Formula can be applied to any level of the hierarchy– Can be generalized for the entire hierarchy
14
Simple Cache Model• Assume that the processor accesses
memory one word at a time.• A block consists of one word.• When a word is referenced and is not in
the cache, it is put in the cache (copied from main memory).
15
Cache Usage
• At some point in time the cache holds • memory items X1,X2,…Xn-1• The processor next accesses memory
item Xn which is not in the cache.• How do we know if an item is in the
cache?• If it is in the cache, how do we know where
it is?
16
17
Cache Arrangement• How should the data in the cache be arranged?• Several different approaches
– Direct Mapped – Memory addresses map to particular location in the cache
– Fully Associative – Data can be placed anywhere in the cache
– N-way Set Associative – Data can be placed in a limited number of places in the cache depending upon the memory address
18
Direct Mapped Cache Organization
• Each memory location is mapped to a single location in the cache.– there in only one place it can be!
• Remember that the cache is smaller than memory, so many memory locations will be mapped to the same location in the cache.
19
Mapping Function
• The simplest mapping is based on the LS bits of the address.
• For example, all memory locations whose address ends in 001 will be mapped to the same location in the cache.
• The requires a cache size of 2^n locations (a power of 2).
20
A Direct Mapped Cache•Memory addresses are mapped to cache index
-The index is given by the (block address) modulo (num blocks in cache)• If cache size is power of 2, mode operators is simply throwing away
some high order bits from addressEX: Direct Mapped cache 8 words, MEM size is 32 words
use LOG2 8 = 3 to have cache address = XXXAs the cache size is power of 2 throw away higher bitsEX: 00 001 => the cache addr = 001
11 101 => the cache addr = 101
21
Problem With Direct mapped Cache• We still need a way to find out which of the many
possible memory elements is currently in a cache slot.– slot: a location in the cache that can hold a block.
• We need to store the address of the item currently using cache slot 001.
• We therefore add a tag to each cache entry that identifies which address it currently contains by storing the MSBs that uniquely identify that memory address (LSBs are referred to a particular cache entry)
• The tag associated with a cache slot tells who is currently using the slot.
• We don’t need to store the entire memory location address, just those bits that are not used to determine the slot number (the mapping).
22
Solution
A Field in a table used for a memory hierarchy that contains the address information required to identify whether the associated block in the hierarchy corresponds to required word
23
Initialization Problem• Initially the cache is empty.
– all the bits in the cache (including the tags) will have random values.
• After some number of accesses, some of the tags are real and some are still just random junk.
• How do we know which cache slots are junk and which really mean something?
24
Answer: Introduce Valid BITS• Include one more bit with each cache slot
that indicates whether the tag is valid or not.
• Provide hardware to initialize these bits to 0 (one bit per cache slot).
• When checking a cache slot for a specific memory location, ignore the tag if the valid bit is 0.
• Change a slot’s valid bit to a 1 when putting something in the slot (from main memory).
25
Direct mapped Cache with valid Bit
Recommended