39
CSCI 3431: OPERATING SYSTEMS Chapter 9 – Virtual Memory (Pgs 357 - 409)

Chapter 9 – Virtual Memory (Pgs 357 - 409). Overview Instructions need to be in memory to be executed Address space (bus size) is usually bigger than

Embed Size (px)

Citation preview

CSCI 3431: OPERATING SYSTEMS

Chapter 9 – Virtual Memory (Pgs 357 - 409)

Overview

Instructions need to be in memory to be executed

Address space (bus size) is usually bigger than physical space (memory size)

Not all a program is needed at any point in time

Loading only what is needed speeds up execution and permits more processes to be in memory

Virtual Memory

Separates logical memory (i.e., address space seen by compiler/user) from the physical memory (i.e., actual RAM on the motherboard)

Each process can be given its own independent address space

Sharing, forking, code reuse can all be improved and supported more effectively

Demand Paging

Load pages only as they are needed Pager – loads and stores pages from

secondary storage Swapper – loads and stores

processes from secondary storage Pages in memory are called

"memory resident" and are usually identified using the "valid bit" in the page table

Page Faults

1. Determine if the reference is valid (i.e., check PCB to see if reference is part of process)

2. If invalid, abort, else begin paging3. Find a free frame in physical

memory4. Schedule load from disk to frame5. On completion, update page table6. Restart instruction

Handling a Page Fault (9.6)

Concepts in Paging

Pure Demand Paging – Pages are only brought in when a page fault occurs

Locality of Reference: a phenomenon whereby programs tend to use small sets of pages that tend to change slowly

Instruction restarting – usually just repeating current instruction in PC

Sometimes difficult to predict page faults for CISC hardware

Performance1. Trap to operating system2. Jmp to page fault handler3. Save registers and state4. Determine validity of address5. Identify an available frame6. Issue a disk read, block process in IO queue7. Run scheduler, execute another process ...8. After page load, Disk trap to operating system9. Jmp to disk interrupt handler10. Figure out the disk was handling a page fault11. Update page table for process, unblock process, reset process so it

can resume at failed instruction12. Run scheduler, execute some process13. ... Resume where we left off when we are scheduled to run ...

All this takes HOW LONG for ONE page fault?

What this means ...

Interrupt handlers are often hand-coded assembly for optimal performance

Page fault times are horrible if there are a lot of disk requests

Disk activity is the major wait factor We can compute Effective Access Time

= ((1-p) * timemem) + (p * timefault)

where p is the % probability of a fault And if there is no empty frame ...

Copy on Write

Pages only ever change if written to Pages that are not changed can be

shared Thus, assume everything is shared,

and only copy if a write occurs to a page

Makes for very fast forks Sometimes OS reserves frames so

they are available when needed for the copy

Page Replacement

Eventually the number of active pages exceeds the number of available frames

1. Swap the entire process and all its pages out

2. Swap out pages as needed to make space for new pages

Most systems try to page out read-only pages because no write is needed (i.e., clean vs. dirty pages)

Requires updating of the processes page table (which is why it is held by the kernel)

Algorithms

While we can use random data, it is best to use a "reference string" of page faults developed from actual runtime data when evaluating algorithms

Goal is to minimise number of page replacements

Key factor is ratio of pages:frames Higher = more faults, lower = fewer

faults

FIFO

Replace the first one loaded (which is also the oldest)

Mediocre performance Belady's Anomaly: Increasing

number of frames INCREASES the fault rate!

FIFO suffers from this anomaly for some reference strings (example, page 374)

Optimal

We know that a replacement strategy is optimal if it does the least work

Page that is best to replace is the one that is going to be needed for the longest period of time

Problem: Requires knowledge of when a page will be needed

Not really possible and mostly used as a theoretical bound when measuring other strategies

LRU

Past history is the best indicator of future use

Least likely to be used is the page that was Least Recently Used (LRU)

Some page tables have a reference bit that is set whenever a page is referenced Good: enables us to delete pages not used very

much Bad: setting the reference bit on every memory

access is time consuming Reference bit allows us to approximate LRU

Reference Shifts

Use 8 bits (or more) for the reference bits

Use some regular clock interval Zero the reference bits when a page

is loaded Set the high-order (leftmost) bit to 1

when the page is referenced Every clock interval, shift the bits

right by 1 Treat as an integer, higher means

more recently used and more frequently used

Second Chance

Algorithm is a variant of FIFO When a page is needed, take the first If reference bit set, clear it and move page

to back of the queue If not set, replace it Sometimes called the "clock" algorithm Can be easily implemented using a

circular queue Can be enhanced by avoiding (if possible)

pages that are dirty

Other Options

Reference counting – count references and replace the least frequently used

Issue – LFU can replace a new page that hasn't had a chance to be used much yet

P00ls of empty pages can be kept so a free page is always available – permits CPU to schedule the creation of more free pages for times when its less busy or disk isn't in use

*** Next Class: 9.5+, Frame Allocation ***

Raw Disk

It is possible to write directly to, or read directly from, a disk block

This technique is fast and avoids creating directory entries, file system data, etc.

Used by some databases and other applications

Bypasses paging mechanism – disk savings usually lost due to memory management problems

Frame Allocation

Page faults require a frame during servicing Are all frames equal? Do we just apply some variant of LRU, or do we need

to ensure each user has their "fair share"? Performance requires that a process be given some

minimum number of frames Hardware is a factor – What is the maximum number

of frames that an instruction could possibly need to access? (This value makes a good minimum)

Indirect addressing also adds to the page minimum (mostly an obsolete idea now)

Also – stack & heap of process, OS tables, scheduler code, interrupt handlers etc. must be considered

Strategies

Equal Allocation: f frames are divided equally among u users (f/u per user)

Proportional Allocation: Allocate memory in proportion to the size of the process

I.e., if p2 is twice the size of p1, p2 gets 2/3 of memory and p1 gets 1/3 of memory

p/S * f where S = total pages for all pi

Allocation con't

OS needs some pages to operate efficiently

Process should always have its minimum Pool of free frames is useful Have not considered priority in any way Useful to consider scheduling – if a

process isn't going to run, does it need much memory?

IO bound processes do a lot of waiting, good candidates for less memory

Global vs. Local Allocation Global Allocation: All frames are viable

candidates and must be considered for replacement when a new page is loaded

Local Allocation: Only the frames allocated to the process are considered for replacement

NUMA just messes everything up ... > Good: being able to add additional memory

on new memory cards > Bad: cards are slower than the memory on

the motherboard and should not be paged as often

Thrashing

When a process spends more time being paged than it does being executed

Occurs when page fault rate gets too high Local replacement can prevent processes

from causing each other to thrash

Locality: A set of pages that are used together

A process can be viewed as a set of sequential localities that tend to overlap

Locality Example (Fig 9.19)

Working Set

The set of pages accessed in some time period

Can adjust the time period to produce smaller or (somewhat) larger working sets

Captures a process' current locality Can be used to prevent thrashing Do not start (or swap out) a process

when some process(s) do not have sufficient frames for their working set

Working Set Size

= number of total frames needed if the interval is greater than the life of the program

= 1 if the interval contains only one instruction with not memory reference

Generally try to use an interval that captures the locality of most processes

Fault Rate

Alternatively, we can ignore the working set

Simply use the Page Fault Rate to avoid thrashing

Goal is to keep the fault rate below some limit

Very simple to calculate and keep track of

Generally a good approach, but like all approaches, fails in some rare situations

Memory Mapping

Loading of a file into memory to minimise disk accesses

Uses the standard paging system to do the loading

Can cause synchronisation issues between in-memory version and on-disk version until the file is closed

uses the mmap (or mmap2) system call often used as a technique for memory sharing supports concurrent file access by multiple

processes

IO

Some IO devices have their buffers also mapped into main memory

Writes to this location are then duplicated to the actual IO device

Used for Disks, Monitors/Video Also used for hardware ports such as mouse,

game controllers, USB If device uses a status/controller register for

communication, it is programmed IO (PIO) If devices uses interrupts for communication

it is interrupt driven

Kernel Memory

The OS often works with non-paged data

Some data is very small (e.g., a semaphore) and doesn't need a page

Some data is very large (i.e., a disk block) and must be on contiguous frames so its address space is continuous

"Buddy System"

Round all data sizes up to a power of 2

Easier to deal with memory blocks if they are 32 bytes, 64 bytes, 128 bytes etc.

Large blocks easily split to form two smaller ones

Smaller adjacent blocks combined to form one larger one

Works well and is a very commonly used technique in all kinds of system programming

Slab Allocation

Keep a "slab" of memory for each specific kernel data structure (e.g., semaphore, PCB, Page Table)

Manage the slap almost as if it was paged, with pages the same size as the data structure

Has some issues if more pages are needed, as they won't be contiguous (which is necessary)

Has been generalised and can be used for some user-mode requests

This is how most modern OSs now function

Prepaging

Tries to avoid page faults by anticipating future faults

Wastes time if unneeded page brought into memory

Is often faster to read two sequential disk blocks than to issue two independent disk reads (so we can save time)

Page Size

Strongly dictated by CPU architecture Also influenced by:

Bus Width/Size Disk block size System use characteristics (e.g., process

and file sizes) Amount of physical memory available Time to service a page fault

TLB Reach

The amount of memory accessible from the TLB

Should be greater than the working set size

Can be influenced by page size TLB size may be configurable,

depending upon the hardware

Page Locking

Sometimes a page should be locked into memory so that its not paged out, e.g.,

Disk buffer after an IO request issued The OS itself (i.e., scheduler, mmu -

interrupt handlers, frame table) Solution is to put a "lock bit" into the

frame table for each frame

Programming Concerns

Let the compiler optimise!

Most situations that result in poor performance can be detected and avoided by the compiler (e.g., memory access in loops)

Also note:1. Pointers and malloc will can be used very badly

and can really harm performance2. Objects almost always harm performance (as

does method over-riding) – there is a reason no web browser, OS, compiler, VM, server ... is OO

To Do:

Do Lab 5 Read Chapter 9 (pgs 357-409; this

and next lecture) Continue working on Assignment 2