Author
betty-edwards
View
231
Download
2
Embed Size (px)
Cache MemoryChapter 17S. Dandamudi
OutlineIntroductionHow cache memory worksWhy cache memory worksCache design basicsMapping functionDirect mappingAssociative mappingSet-associative mappingReplacement policiesWrite policiesSpace overheadTypes of cache missesTypes of cachesExample implementationsPentiumPowerPCMIPSCache operation summaryDesign issuesCache capacityCache line sizeDegree of associatively
IntroductionMemory hierarchyRegistersMemoryDiskCache memory is a small amount of fast memoryPlaced between two levels of memory hierarchyTo bridge the gap in access timesBetween processor and main memory (our focus)Between main memory and disk (disk cache)Expected to behave like a large amount of fast memory
Introduction (contd)
How Cache Memory WorksPrefetch data into cache before the processor needs itNeed to predict processor future access requirements Not difficult owing to locality of referenceImportant termsMiss penaltyHit ratioMiss ratio = (1 hit ratio)Hit time
How Cache Memory Works (contd)Cache read operation
How Cache Memory Works (contd)Cache write operation
How Cache Memory Works (contd)Column-orderRow-order
Chart1
6040
9060
11070
150100
190120
240150
Matrix size
Execution time (ms)
Sheet1
5006040
6009060
70011070
800150100
900190120
1000240150
Sheet1
Matrix size
Execution time (ms)
Sheet2
Sheet3
Cache Design BasicsOn every read missA fixed number of bytes are transferredMore than what the processor needsEffective due to spatial localityCache is divided into blocks of B bytesb-bits are needed as offset into the blockb = log2BBlock are called cache linesMain memory is also divided into blocks of same sizeAddress is divided into two parts
Cache Design Basics (contd)B = 4 bytesb = 2 bits
Cache Design Basics (contd)Transfer between main memory and cacheIn units of blocksImplements spatial localityTransfer between main memory and cacheIn units of wordsNeed policies forBlock placementMapping functionBlock replacementWrite policies
Cache Design Basics (contd)Read cycle operations
Mapping FunctionDetermines how memory blocks are mapped to cache linesThree typesDirect mappingSpecifies a single cache line for each memory blockSet-associative mappingSpecifies a set of cache lines for each memory blockAssociative mappingNo restrictionsAny cache line can be used for any memory block
Mapping Function (contd)Direct mapping example
Mapping Function (contd)Implementing direct mappingEasier than the other twoMaintains three pieces of informationCache dataActual dataCache tagProblem: More memory blocks than cache linesSeveral memory blocks are mapped to a cache lineTag stores the address of memory block in cache lineValid bitIndicates if cache line contains a valid block
Mapping Function (contd)
Mapping Function (contd)Reference pattern:0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4Direct mappingHit ratio = 0%
Mapping Function (contd)Reference pattern:0, 7, 9, 10, 0, 7, 9, 10, 0, 7, 9, 10Direct mappingHit ratio = 67%
Mapping Function (contd)Associative mapping
Mapping Function (contd)Associative mappingReference pattern:0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4Hit ratio = 75%
Mapping Function (contd)Address match logic forassociative mapping
Mapping Function (contd)Associative cache with address match logic
Mapping Function (contd)Set-associative mapping
Mapping Function (contd)Address partition in set-associative mapping
Mapping Function (contd)Set-associative mappingReference pattern:0, 4, 0, 8, 0, 8, 0, 4, 0, 4, 0, 4Hit ratio = 67%
Replacement PoliciesWe invoke the replacement policyWhen there is no place in cache to load the memory blockDepends on the actual placement policy in effectDirect mapping does not need a special replacement policyReplace the mapped cache lineSeveral policies for the other two mapping functionsPopular: LRU (least recently used)Random replacementLess interest (FIFO, LFU)
Replacement Policies (contd)LRUExpensive to implement Particularly for set sizes more than fourImplementations resort to approximationPseudo-LRUPartitions sets into two groupsMaintains the group that has been accessed recentlyRequires only one bitRequires only (W-1) bits (W = degree of associativity) PowerPC is an exampleDetails later
Replacement Policies (contd)Pseudo-LRU implementation
Write PoliciesMemory write requires special attentionWe have two copiesA memory copyA cached copyWrite policy determines how a memory write operation is handledTwo policiesWrite-throughUpdate both copiesWrite-backUpdate only the cached copyNeeds to be taken care of the memory copy
Write Policies (contd)Cache hit in a write-through cacheFigure 17.3a
Write Policies (contd)Cache hit in a write-back cache
Write Policies (contd)Write-back policyUpdates the memory copy when the cache copy is being replacedWe first write the cache copy to update the memory copyNumber of write-backs can be reduced if we write only when the cache copy is different from memory copyDone by associating a dirty bit or update bitWrite back only when the dirty bit is 1Write-back caches thus require two bitsA valid bitA dirty or update bit
Write Policies (contd)Needed only in write-back caches
Write Policies (contd)Other ways to reduce write trafficBuffered writesEspecially useful for write-through policiesWrites to memory are buffered and written at a later timeAllows write combiningCatches multiple writes in the buffer itselfExample: PentiumUses a 32-byte write bufferBuffer is written at several trigger pointsAn example trigger pointBuffer full condition
Write Policies (contd)Write-through versus write-backWrite-throughAdvantage Both cache and memory copies are consistentImportant in multiprocessor systemsDisadvantage Tends to waste bus and memory bandwidthWrite-backAdvantage Reduces write traffic to memoryDisadvantages Takes longer to load new cache linesRequires additional dirty bit
Space OverheadThe three mapping functions introduce different space overheadsOverhead decreases with increasing degree of associativitySeveral examples in the text4 GB address space32 KB cache
Types of Cache MissesThree typesCompulsory missesDue to first-time access to a blockAlso called cold-start misses or compulsory line fillsCapacity missesInduced due to cache capacity limitationCan be avoided by increasing cache sizeConflict missesDue to conflicts caused by direct and set-associative mappingsCan be completely eliminated by fully associative mappingAlso called collision misses
Types of Cache Misses (contd)Compulsory missesReduced by increasing block sizeWe prefetch moreCannot increase beyond a limitCache misses increaseCapacity missesReduced by increasing cache sizeLaw of diminishing returnsConflict missesReduced by increasing degree of associativityFully associative mapping: no conflict misses
Types of CachesSeparate instruction and data cachesInitial cache designs used unified cachesCurrent trend is to use separate caches (for level 1)
Types of Caches (contd)Several reasons for preferring separate cachesLocality tends to be strongerCan use different designs for data and instruction cachesInstruction cachesRead only, dominant sequential accessNo need for write policiesCan use a simple direct mapped cache implementationData cachesCan use a set-associative cacheAppropriate write policy can be implementedDisadvantageRigid boundaries between data and instruction caches
Types of Caches (contd)Number of cache levelsMost use two levelsPrimary (level 1 or L1)On-chipSecondary (level 2 or L2)Off-chipExamplesPentiumL1: 32 KBL2: up to 2 MBPowerPCL1: 64 KBL2: up to 1 MB
Types of Caches (contd)Two-level caches work as follows:First attempts to get data from L1 cacheIf present in L1, gets data from L1 cache (L1 cache hit)If not, data must come form L2 cache or main memory (L1 cache miss)In case of L1 cache miss, tries to get from L2 cacheIf data are in L2, gets data from L2 cache (L2 cache hit)Data block is written to L1 cacheIf not, data comes from main memory (L2 cache miss)Main memory block is written into L1 and L2 cachesVariations on this basic scheme are possible
Types of Caches (contd)Virtual and physical caches
Example ImplementationsWe look at three processorsPentiumPowerPCMIPSPentium implementationTwo levelsL1 cache Split cache designSeparate data and instruction cachesL2 cache Unified cache design
Example Implementations (contd)Pentium allows each page/memory region to have its own caching attributesUncacheableAll reads and writes go directly to the main memoryUseful for Memory-mapped I/O devicesLarge data structures that are read onceWrite-only data structuresWrite combiningNot cachedWrites are buffered to reduce access to main memoryUseful for video buffer frames
Example Implementations (contd)Write-throughUses write-through policyWrites are delayed as they go though a write buffer as in write combining modeWrite backUses write-back policyWrites are delayed as in the write-through modeWrite protectedInhibits cache writesWrite are done directly on the memory
Example Implementations (contd)Two bits in control register CR0 determine the modeCache disable (CD) bitNot write-through (NW) bitwWrite-back
Example Implementations (contd)PowerPC cache implementationTwo levelsL1 cache Split cache Each: 32 KB eight-way associativeUses pseudo-LRU replacementInstruction cache: read-onlyData cache: read/writeChoice of write-through or write-backL2 cache Unified cache as in PentiumTwo-way set associative
Example Implementations (contd)Write policy type and caching attributes can be set by OS at the block or page levelL2 cache requires only a single bit to implement LRUBecause it is 2-way associativeL1 cache implements a pseudo-LRUEach set maintains seven PLRU bits (B0-B6)
Example Implementations (contd)PowerPC placement policy (incl. PLRU)
Example Implementations (contd)MIPS implementationTwo-level cacheL1 cacheSplit organizationInstruction cacheVirtual cacheDirect mappedRead-onlyData cacheVirtual cacheDirect mappedUses write-back policyL1 line size: 16 or 32 bytes
Example Implementations (contd)L2 cachePhysical cacheEither unified or splitConfigured at boot timeDirect mappedUses write-back policyCache block size16, 32, 64, or 128 bytesSet at boot timeL1 cache line size L2 cache sizeDirect mapping simplifies replacementNo need for LRU type complex implementation
Cache Operation SummaryVarious policies used by cachePlacement of a blockDirect mappingFully associative mappingSet-associative mappingLocation of a blockDepends on the placement policyReplacement policyLRU is the most popularPseudo-LRU is often implementedWrite policyWrite-throughWrite-back
Design IssuesSeveral design issuesCache capacityLaw of diminishing returnsCache line size/block sizeDegree of associativityUnified/splitSingle/two-levelWrite-through/write-backLogical/physical
Design Issues (contd)Last slide