16
Memory management in (x86) Xen Tim Deegan Vancouver, February 2009

Xen Memory Management

Embed Size (px)

DESCRIPTION

Basic overview of Xen Hypervisor Memory Management

Citation preview

Page 1: Xen Memory Management

Memory management in (x86) Xen

Tim Deegan

Vancouver, February 2009

Page 2: Xen Memory Management

2© 2008 Citrix Systems, Inc. — All rights reserved

Xen’s memory services

• Memory management• Allocating memory to guests, scrubbing free memory

• Tracking memory usage with reference counts and types

Heap allocators and the frametable.

• Virtual memory• Protecting guests from each other

• Enforcing typing rules, e.g. read-only areas

• Providing translation services between address spaces

MMU hypercalls, shadow pagetables, hardware-assisted paging

Page 3: Xen Memory Management

3© 2008 Citrix Systems, Inc. — All rights reserved

Terminology

• Virtual address/Physical address/Machine address

• Frame vs. Page

• PFN: physical frame number• Guest’s abstraction for tracking/allocating RAM

• Usually fairly contiguous

• GFN: guest frame number• Guest’s idea of what hardware addresses are

• Used in guest pagetables

• MFN: machine frame number• Actual hardware addresses

Page 4: Xen Memory Management

4© 2008 Citrix Systems, Inc. — All rights reserved

Basic memory management

• Buddy allocator hands out frames

• Each guest has a max number of frames

• Frame-table records for each frame:• Owner, if any

• Linked list of other frames owned by this guest

• Reference count (must be zero to free the frame)

• Type, and a refcount for the type (must be zero to change type)

• TLB-flush-avoidance timestamp

Page 5: Xen Memory Management

5© 2008 Citrix Systems, Inc. — All rights reserved

PV pagetables, a.k.a. direct paging

• PFN MFN table managed by the guest

• Shared MFN PFN table provided by Xen

• GFN == MFN, so pagetables can be used directly by the hardware

• Xen checks the contents of the guest pagetables before allowing the hardware to see them.

Page 6: Xen Memory Management

6© 2008 Citrix Systems, Inc. — All rights reserved

Enforcing isolation

• Guest pagetables must have a pagetable type

• Xen checks that page contents obey the typing rules before allowing them to take on PT type

• Typing rules:• No mapping other guests’ frames

• No read-write mappings of frames with PT type

• Modifying an already-typed PT needs a call to Xen to check the modification obeys the rules.

(Or trap-and-emulate assistance from Xen.)

Page 7: Xen Memory Management

7© 2008 Citrix Systems, Inc. — All rights reserved

Grant Tables

• Guest-supplied ACLs allowing other guests to map their frames

• Mapper makes a hypercall with a domid, an opaque index, and the address of a PTE

• Xen checks that entry in the mappee’s grant table and if it’s OK, modifies the PTE

• Needs explicit unmap hypercall when finished

• Also available: grant-copy, where Xen memcpy()s from/to a granted frame instead of mapping it.

Page 8: Xen Memory Management

8© 2008 Citrix Systems, Inc. — All rights reserved

HVM pagetables

• PFN MFN table managed by Xen

• GFN == PFN so need another layer of translation

• Guest won’t cooperate in enforcing access control

• Two options:

• Xen builds shadow copies of guest pagetables with the extra translations and controls added; or

• Hardware support for using a second set of pagetables containing extra translations and controls

Page 9: Xen Memory Management

9© 2008 Citrix Systems, Inc. — All rights reserved

Shadow pagetables

• Keep Xen-maintained copies of guest frames that we think are being used as pagetables

• Guest never sees the shadows so we can add any translations and restrictions we like

• 13 different kinds of shadows depending on what kind of pagetable we think it is: a single frame can have up to 10 shadows at once

• Also have three kinds of shadows for faking out superpages (2MB of contiguous PFNs does not mean 2MB of contiguous MFNs)

Page 10: Xen Memory Management

10© 2008 Citrix Systems, Inc. — All rights reserved

Shadow pagetables: building

• Start with an empty top-level shadow of the PFN in CR3

• On pagefault, shadow the entries in the PT walk, making new shadows at each level if necessary.

• Each shadow entry is the guest entry with the GFN replaces by an MFN (of the next-level shadow or of guest memory) and extra access restrictions:• Pages that have shadows are mapped read-only.

• Extra restrictions can be specified in the PFN MFN table.

• We can restrict write access to guest’s frames for tracking page-dirtying during live migration.

Page 11: Xen Memory Management

11© 2008 Citrix Systems, Inc. — All rights reserved

Shadow pagetables: maintenance

• Shadowed pages are always kept read-only.

• When the guest writes to a shadowed frame, Xen’s pagefault handler must:• Emulate the current instruction to figure out what’s being written;

• Write the new value into the guest pagetable; and

• Update the equivalent parts of all shadows of the frame.

Page 12: Xen Memory Management

12© 2008 Citrix Systems, Inc. — All rights reserved

Shadow pagetables: tearing back down

• Shadowing a frame is expensive• Thousands of cycles for trap and emulation of every write.

• Easy to tell when a page becomes a PT; harder to tell when it stops:• Reference count based on higher-level shadows and CR3 contents,

but hard to know when a PFN’s been used in CR3 for the last time

• Guess based on odd-looking page contents

• Guess based on memory access patterns

• Get PV drivers to give us hints

• Recycle under memory pressure by approximating LRU

Page 13: Xen Memory Management

13© 2008 Citrix Systems, Inc. — All rights reserved

Optimizations

• Tagged TLBs (AMD’s ASID; Intel’s VPID) allow us to avoid a TLB flush on every VMEXIT/VMENTER• In theory can do even better now that Win2k8 supports context

switching without TLB flushing.

• Shadowing not-present entries with invalid entries lets us fast-track “real” pagefaults back to the guest

• Out-of-sync shadows: let the guest write directly to the lowest level of pagetables and sync up the shadows whenever a hardware TLB would re-read (TLB flush, page faults, higher-level writes)

Page 14: Xen Memory Management

14© 2008 Citrix Systems, Inc. — All rights reserved

Hardware-assisted paging

• Xen supplies a second set of pagetables describing the PFN MFN translation and extra restrictions

• CPU takes a pointer to this as well as a (PFN-space) CR3 value from the guest

• MMU hardware applies the composition of the two translations and the intersection of the access rights

Page 15: Xen Memory Management

15© 2008 Citrix Systems, Inc. — All rights reserved

Hardware-assisted paging: performance

Avoid expensive trap + emulate on writes to PTs, and extra logic on pagefault path

TLB fill can now take 20 memory accesses!

CPU’s TLB is much smaller than the set of shadows we can maintain

• AMD’s RVI gives +10% performance over shadows on some workloads, -10% on others; Intel’s EPT seems more consistently better than shadowing

• Performance depends heavily on using superpagemappings in the second pagetable

Page 16: Xen Memory Management

16© 2008 Citrix Systems, Inc. — All rights reserved

Fin