32
Computer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture 5 th edition by William Stallings

Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

  • Upload
    docong

  • View
    236

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

Computer Architecture,Memory Hierarchy &

Virtual Memory

Some diagrams from Computer Organization and Architecture 5th edition by William Stallings

Page 2: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

2CMPE12c Cyrus Bazeghi

“Memory HierarchyPyramid”

$0.137/MB

$1.10/GB

Pages

Blocks

Lines

Words

CPU Registers

3-10 acc/cycl32-64 words

On-Chip Cache1-2 access/cycle

5-10 ns 1KB - 2MB

Off-Chip Cache (SRAM)5-20 cycles/access

10-40 ns 1MB – 16MB

Main Memory (DRAM)20-200 cycles/access

60-120ns64MB -many GB

Disk or Network1M-2M cycles/access

4GB – many TB

Page 3: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

3CMPE12c Cyrus Bazeghi

Movement of Memory Technology

120306020.25Alpha 2116428147050.5Alpha 210640.66120020010VAX 11/780

/ Instr.CyclesMemory (ns)(ns)PenaltyMissMainClockCPIMachine

????~50.5??Pentium IV

CPI: Cycles per instruction

Page 4: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

4CMPE12c Cyrus Bazeghi

Cache and Main Memory

Problem: Main Memory is slow compared to CPU.

Solution: Store the most commonly used data in a smaller, faster memory. Good trade off between $$ and performance.

Page 5: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

5CMPE12c Cyrus Bazeghi

At any time some subset of the Main Memory resides in the Cache. If a word in a block of memory is read, that block is transferred to one of the lines of the cache.

Generalized Caches

“Cache/Main-Memory Structure”

Page 6: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

6CMPE12c Cyrus Bazeghi

Generalized Caches

“Cache Read Operation”

CPU generates an address, RA, that it wants to read a word from. If the word is in the cache then it is sent to the CPU. Otherwise, the block that would contain the word is loaded into the cache, and then the word is sent to the processor.

Page 7: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

7CMPE12c Cyrus Bazeghi

Elements of Cache Design

Cache Size

Mapping FunctionDirectAssociativeSet Associative

Replacement AlgorithmLeast recently used (LRU)First in first out (FIFO)Least frequently used (LFU)Random

Write PolicyWrite throughWrite back

Line Size

Number of cachesSingle or two levelUnified or split

Page 8: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

8CMPE12c Cyrus Bazeghi

Cache Size

“Bigger is better” is the motto.

The problem is you can only fit so much on to the chip with out making it too expensive to make or sell for your intended market sector.

Page 9: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

9CMPE12c Cyrus Bazeghi

Mapping Functions

Since cache is not as big as the main memory how do we determine where data is written/read to/from the cache?

The mapping functions are how memory addresses are mapped into cache locations.

Page 10: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

10CMPE12c Cyrus Bazeghi

“Direct Mapping”

Mapping Function

Map each block of main memory into only one possible cache line.

Page 11: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

11CMPE12c Cyrus Bazeghi

“Fully Associative”

Mapping Function

More flexible than direct because it permits each main memory block to be loaded into any line of the cache. Makes it much more complex though.

Page 12: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

12CMPE12c Cyrus Bazeghi

Mapping Function

“Two-Way Set Associative”Compromise that has the pros of both direct and associative while reducing their disadvantages.

Page 13: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

13CMPE12c Cyrus Bazeghi

Replacement Algorithms

Since the cache is not as big as the main memory you have to replace things in it.

Think of cache as your bed side table and main memory like the library. If you want more books from the library you need to replace some books on you shelf.

Page 14: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

14CMPE12c Cyrus Bazeghi

Replacement algorithm

Least Recently Used (LRU) – probably the most effective. Replace the line in the cache that has been in the cache the longest with no reference to it.

First-In-First-Out (FIFO) – replace the block that has been in the cache the longest. Easy to implement.

Least Frequently Used (LFU) – replace the block that has had the least references. Requires a counter for each cache line.

Random – just randomly replace a line in the cache. Studies show this gives only slightly worse performance than the above ones.

Page 15: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

15CMPE12c Cyrus Bazeghi

Write Policy

Before a block resides in the cache can be replaced, you need to determine if it has been altered in the cache but not in the main memory.

If so, you must write the cache line back to main memory before replacing it.

Page 16: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

16CMPE12c Cyrus Bazeghi

Write Policy

Write Through – the simplest technique. All write operations are made to main memory as well as to the cache, ensuring that memory is always up-to-date. Cons: Generates a lot of memory traffic.

Write Back – minimizes memory writes. Updates are only made to the cache. Only when the block is replaced is it written back to main memory. Cons: I/O modules must go through the cache or risk getting stale memory.

Page 17: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

17CMPE12c Cyrus Bazeghi

Line size / Num. of caches

Line sizeSo cache is number of lines by size of line. A line contains many words so the longer the line the more time it takes to decode where the word is in the line.

Number of cachesEither a data cache and a separate instruction cache or just one, unified cache.

Page 18: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

18CMPE12c Cyrus Bazeghi

Cache Examples

• Intel Pentium II• IBM/Motorola Power PC G3• DEC/Compaq/HP Alpha 21064

Page 19: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

19CMPE12c Cyrus Bazeghi

Example Cache Organizations

“Pentium II Block Diagram”Cache StructureHas two L1 caches, one for data, one for instructions. The instruction cache is four-way set associative, the data cache is two-way set associative. Sizes ranges from 8KB to 16KB.

The L2 cache is four-way set associative and ranged in size from 256KB to 1MB.

Processor CoreFetch/decode unit: fetches program instructions in order from L1 instruction cache, decodes these into micro-operations, and stores the results in the instruction pool.

Instruction pool: current set of instructions to execute.

Dispatch/execute unit: schedules execution of micro-operations subject to data dependencies and resource availability.

Retire unit: determines when to write values back to registers or to the L1 cache. Removes instructions from the pool after committing the results.

Page 20: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

20CMPE12c Cyrus Bazeghi

Example Cache Organizations

“Power PC G3 Block Diagram”

Cache StructureL1 caches are eight-way set associative. The L2 cache is a two-way set associative cache with 256KB, 512KB, or 1MB of memory.

Processor CoreTwo integer arithmetic and logic units which may execute in parallel. Floating point unit with its own registers.Data cache feeds both the integer and floating point operations via a load/store unit.

Page 21: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

21CMPE12c Cyrus Bazeghi

Cache Example: 21064

“Alpha 21064”

• 8 KB cache. With 34-bit addressing.• 256-bit lines (32 bytes)• Block placement: Direct map

• One possible place for each address• Multiple addresses for each possible place

33 0

Cache Index OffsetTag

• Cache line includes…• tag• data

Page 22: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

22CMPE12c Cyrus Bazeghi

Cache Example: 21064

Page 23: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

23CMPE12c Cyrus Bazeghi

How cache works for 21064Cache operation

• Send address to cache• Parse address into offset, index, and tag• Decode index into a line of the cache,

prepare cache for reading (precharge)• Read line of cache: valid, tag, data• Compare tag with tag field of address• Miss if no match• Select word according to byte offset and

read or write

Page 24: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

24CMPE12c Cyrus Bazeghi

How cache works for 21064

Cache operation continued…

If there is a miss…- Stall the processor while reading in line

from the next level of memory hierarchy- Which in turn may miss and read from

main memory- Which in turn may miss and read

from disk

Page 25: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

25CMPE12c Cyrus Bazeghi

Virtual Memory

Cache is relatively expensive, main memory is much cheaper, disk drives are even cheaper though.

Virtual memory is the using of disk space as if it where more RAM.

Page 26: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

26CMPE12c Cyrus Bazeghi

Virtual Memory

Block 1KB-16KBHit 20-200 Cycles DRAM accessMiss 700,000-6,000,000 cycles Page FaultMiss rate 1:0.1 – 10 million

Differences from cache• Implement miss strategy in software• Hit/Miss factor 10,000+ (vs 10-20 for cache)• Critical concerns are

• Fast address translation• Miss ratio as low as possible without ideal

knowledge

Page 27: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

27CMPE12c Cyrus Bazeghi

Virtual Memory Characteristics

Fetch strategy• Swap pages on task switch• May pre-fetch next page if extra transfer time is

only issue• may include a disk cache

Block Placement• Anywhere – fully associate – random access is easilyavailable, and time to place a block well is tiny compared to miss penalty.

Page 28: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

28CMPE12c Cyrus Bazeghi

Virtual Memory Characteristics

Finding a block – Look in page table

• List of VPNs (Virtual Page Numbers) and physical address (or disk location)

• Consider 32-bit VA, 30-bit PA, and 2 KB pages.• Page table has 232/211 = 221 entries for perhaps

225 bytes or 214 pages.• Page table must be in virtual memory• System page table must always be in memory.

• Translation look-aside buffer (TLB)• Cache of address translations• Hit in 1 cycle (no stalls in pipeline)• Miss results in page table access (which could lead to page fault). Perhaps 10-100 OS instructions.

Page 29: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

29CMPE12c Cyrus Bazeghi

Virtual Memory Characteristics

Page replacement• LRU used most often (really an approximations of

LRU with a fixed time window).• TLB will support determining what translations have

been used.

Write policyWrite through or write back?

Write Through – data is written to both the block in the cache and to the block in lower level memory.

Write Back – data is written only to the block in the cache, only written to lower level when replaced.

Page 30: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

30CMPE12c Cyrus Bazeghi

Virtual Memory Characteristics

Memory protection

• Must index page table entries by PID• Flush TLB on task switch• Verify access to page before loading into TLB• Provide OS access to all memory, physical, and

virtual• Provide some un-translated addresses to OS for

I/O buffers

Page 31: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

31CMPE12c Cyrus Bazeghi

Address Translation Since address translations happen all the time, let’s cache them for faster accesses. We call this caches a translation, look-aside buffer (TLB)

TLB Properties (typically)

• 8-32 entries• Set-associative or fully associative• Random or LRU replacement• Two or more ports (instruction and data)

Page 32: Computer Architecture, Memory Hierarchy & Virtual Memory · PDF fileComputer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture

32CMPE12c Cyrus Bazeghi

Memory Hierarchies

Summary• What is the deal with memory

hierarchies? Why bother?• Why are the caches so small? Why

not make them larger?• Do I have to worry about any of this

when I am writing code?