Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell
CS352H: Computer Systems Architecture
Topic 12: Memory Hierarchy -Virtual Memory and Virtual Machines
October 27, 2009
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2
Memory Technology
Static RAM (SRAM)0.5ns – 2.5ns, $2000 – $5000 per GB
Dynamic RAM (DRAM)50ns – 70ns, $20 – $75 per GB
Magnetic disk5ms – 20ms, $0.20 – $2 per GB
Ideal memoryAccess time of SRAMCapacity and cost/GB of disk
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3
Principle of Locality
Programs access a small proportion of their address spaceat any timeTemporal locality
Items accessed recently are likely to be accessed again soone.g., instructions in a loop, induction variables
Spatial localityItems near those accessed recently are likely to be accessed soonE.g., sequential instruction access, array data
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4
Taking Advantage of Locality
Memory hierarchyStore everything on diskCopy recently accessed (and nearby) items from disk tosmaller DRAM memory
Main memory
Copy more recently accessed (and nearby) items fromDRAM to smaller SRAM memory
Cache memory attached to CPU
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 5
Memory Hierarchy Levels
Block (aka line): unit of copyingMay be multiple words
If accessed data is present in upper levelHit: access satisfied by upper level
Hit ratio: hits/accesses
If accessed data is absentMiss: block copied from lower level
Time taken: miss penaltyMiss ratio: misses/accesses= 1 – hit ratio
Then accessed data supplied from upper level
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 6
Virtual Memory
Use main memory as a “cache” for secondary (disk)storage
Managed jointly by CPU hardware and the operating system (OS)Programs share main memory
Each gets a private virtual address space holding its frequentlyused code and dataProtected from other programs
CPU and OS translate virtual addresses to physicaladdresses
VM “block” is called a pageVM translation “miss” is called a page fault
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 7
Address Translation
Fixed-size pages (e.g., 4K)
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 8
Page Fault Penalty
On page fault, the page must be fetched from diskTakes millions of clock cyclesHandled by OS code
Try to minimize page fault rateFully associative placementSmart replacement algorithms
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 9
Page Tables
Stores placement informationArray of page table entries, indexed by virtual page numberPage table register in CPU points to page table in physical memory
If page is present in memoryPTE stores the physical page numberPlus other status bits (referenced, dirty, …)
If page is not presentPTE can refer to location in swap space on disk
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 10
Translation Using a Page Table
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 11
Mapping Pages to Storage
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 12
Replacement and Writes
To reduce page fault rate, prefer least-recently used (LRU)replacement
Reference bit (aka use bit) in PTE set to 1 on access to pagePeriodically cleared to 0 by OSA page with reference bit = 0 has not been used recently
Disk writes take millions of cyclesBlock at once, not individual locationsWrite through is impracticalUse write-backDirty bit in PTE set when page is written
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 13
Fast Translation Using a TLB
Address translation would appear to require extra memory referencesOne to access the PTEThen the actual memory access
But access to page tables has good localitySo use a fast cache of PTEs within the CPUCalled a Translation Look-aside Buffer (TLB)Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles for miss,0.01%–1% miss rateMisses could be handled by hardware or software
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 14
Fast Translation Using a TLB
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 15
TLB Misses
If page is in memoryLoad the PTE from memory and retryCould be handled in hardware
Can get complex for more complicated page table structuresOr in software
Raise a special exception, with optimized handler
If page is not in memory (page fault)OS handles fetching the page and updating the page tableThen restart the faulting instruction
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 16
TLB Miss Handler
TLB miss indicatesPage present, but PTE not in TLBPage not preset
Must recognize TLB miss before destination registeroverwritten
Raise exception
Handler copies PTE from memory to TLBThen restarts instructionIf page not present, page fault will occur
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 17
Page Fault Handler
Use faulting virtual address to find PTELocate page on diskChoose page to replace
If dirty, write to disk first
Read page into memory and update page tableMake process runnable again
Restart from faulting instruction
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 18
TLB and Cache Interaction
If cache tag uses physicaladdress
Need to translate beforecache lookup
Alternative: use virtualaddress tag
Complications due to aliasingDifferent virtual addressesfor shared physical address
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 19
Memory Protection
Different tasks can share parts of their virtual addressspaces
But need to protect against errant accessRequires OS assistance
Hardware support for OS protectionPrivileged supervisor mode (aka kernel mode)Privileged instructionsPage tables and other state information only accessible insupervisor modeSystem call exception (e.g., syscall in MIPS)
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 20
Virtual Machines
Host computer emulates guest operating system and machine resourcesImproved isolation of multiple guestsAvoids security and reliability problemsAids sharing of resources
Virtualization has some performance impactFeasible with modern high-performance computers
ExamplesIBM VM/370 (1970s technology!)VMWareMicrosoft Virtual PC
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 21
Virtual Machine Monitor
Maps virtual resources to physical resourcesMemory, I/O devices, CPUs
Guest code runs on native machine in user modeTraps to VMM on privileged instructions and access to protectedresources
Guest OS may be different from host OSVMM handles real I/O devices
Emulates generic virtual I/O devices for guest
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 22
Example: Timer Virtualization
In native machine, on timer interruptOS suspends current process, handles interrupt, selects andresumes next process
With Virtual Machine MonitorVMM suspends current VM, handles interrupt, selects and resumesnext VM
If a VM requires timer interruptsVMM emulates a virtual timerEmulates interrupt for VM when physical timer interrupt occurs
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 23
Instruction Set Support
User and System modesPrivileged instructions only available in system mode
Trap to system if executed in user modeAll physical resources only accessible using privilegedinstructions
Including page tables, interrupt controls, I/O registersRenaissance of virtualization support
Current ISAs (e.g., x86) adapting
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 24
Pitfalls
Extending address range using segmentsE.g., Intel 80286But a segment is not always big enoughMakes address arithmetic complicated
Implementing a VMM on an ISA not designed forvirtualization
E.g., non-privileged instructions accessing hardware resourcesEither extend ISA, or require guest OS not to use problematicinstructions
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 25
Concluding Remarks
Fast memories are small, large memories are slowWe really want fast, large memories Caching gives this illusion
Principle of localityPrograms use a small part of their memory space frequently
Memory hierarchyL1 cache ↔ L2 cache ↔ … ↔ DRAM memory↔ disk
Memory system design is critical for multiprocessors