41
Intel® Virtualization Technology Xen Architecture Lv Zheng

Intel® Virtualization Technology Xen Architecture Lv Zheng

Embed Size (px)

Citation preview

Page 1: Intel® Virtualization Technology Xen Architecture Lv Zheng

Intel® Virtualization Technology

Xen Architecture

Lv Zheng

Page 2: Intel® Virtualization Technology Xen Architecture Lv Zheng

2

Objectives

Coverage Xen’s paravirtualization architecture Xen’s hardware virtual machine framework Intel® virtualization technology for IA-32 (VT-x) Intel® virtualization technology for Direct IO (VT-d)

Limitations Do not cover PAE & EM64T

Requirements IA-32 basic architecture

Memory management Interrupts & exceptions Task management Input & output

Quick calculation of 2^N

Page 3: Intel® Virtualization Technology Xen Architecture Lv Zheng

3

Agenda

Xen’s Architecture Terminology & Concepts Bootstrap Process Physical Memory Management Virtual Memory Management Domains Virtualization

Page 4: Intel® Virtualization Technology Xen Architecture Lv Zheng

4

Address Terminology

vaddr, virt (virtual address): Translated virtual addresses in Xen heap. Addressing that Xen uses.

maddr, phys (machine address): Real host machine address, the addresses the processor understands.

paddr (physical address): A catch-all for any kind of physical address. "Physical" here can mean guest-physical, machine-physical or guest-machine-physical. See definitions below.

mfn (machine frame number): Corresponding to maddr. gpfn (guest pseudo frame number): Guests run in an illusory contiguous physical

address space, which is probably not contiguous in the machine address space. gmfn (guest machine frame number): Equivalent to GPFN for an auto-translated

guest, and equivalent to MFN for normal paravirtualised guests. It represents what the guest thinks are MFNs.

pfn (physical frame number): Corresponding to paddr.

virt maddr mfn

mfn_to_virt

map_domain_page

__pa / virt_to_maddr

__va / maddr_to_virt

virt_to_mfn

paddr_to_pfn

pfn_to_paddr

Page 5: Intel® Virtualization Technology Xen Architecture Lv Zheng

5

Agenda

Xen’s Architecture Terminology & Concepts Bootstrap Process Physical Memory Management Virtual Memory Management Domains Virtualization

Page 6: Intel® Virtualization Technology Xen Architecture Lv Zheng

6

gdt_table

Segmentation

gdt_descr / nopaging_gdt_descr Base: gdt_table –

FIRST_RESERVED_GDT_BYTE Limit: 15*4096-1 = 0xEFFF

Segment descriptors DPL0: hypervisor space DPL1: kernel space DPL3: user space TypeA: code segment Type2: data segment Base: 0x00000000 / Limit: 0xFFFFFF:

4GB flat segmentation Segment selectors

RPL: equal to DPL Index: offset from gdt_descr.base

gdt_descr.base

BaseType

A

Base Limit

PDPL

0SBase G D Limit

BaseType

A

Base Limit

PDPL

1SBase G D Limit

BaseType

A

Base Limit

PDPL

3SBase G D Limit

BaseType

2

Base Limit

PDPL

3SBase G B Limit

0x0000000000000000 (unused)

0x0000000000000000 (unused)

Base Address Low Limit (15*4096-1)

GDTR

Base Address High

FIRST_RESERVED_GDT_BYTE =FIRST_RESERVED_GDT_PAGE(14) * PAGE_SIZE(4096)

4*NR_CPUS * 8 (space for TSS and LDT per CPU)

NR_RESERVED_GDT_PAGES(1)

BaseType

2

Base Limit

PDPL

0SBase G B Limit

BaseType

2

Base Limit

PDPL

1SBase G B Limit

__HYPERVISOR_CS

__HYPERVISOR_DS

FLAT_KERNEL_CS

FLAT_KERNEL_DSFLAT_KERNEL_SS

FLAT_USER_CS

FLAT_USER_DSFLAT_USER_SS

RPL0

Index14*(4096/8)+1

RPL0

Index14*(4096/8)+2

RPL1

Index14*(4096/8)+3

RPL1

Index14*(4096/8)+3

RPL3

Index14*(4096/8)+4

RPL3

Index14*(4096/8)+5

0xe008

0xe010

0xe019

0xe021

0xe02b

0xe033

Page 7: Intel® Virtualization Technology Xen Architecture Lv Zheng

7

high mapping (12MB)

low mapping (4GB-64MB)

Early Paging

PS: Page Size extension (also known as PSE), if set, page size will be 4MBytes 0x3FC = PAGE_OFFSET >> 22, 0x3F0 = HYPERVISOR_VIRT_START >> 22, xen is linked at 0xFF000000 + 0x100000 (xen.lds.S) Pink: mapped are. Yellow: unmapped area Low mapping: lower virtual address = lower physical address, map up to HYPERVISOR_VIRT_START as maximum physical address is unknown High mapping: Xen codes’ address = lower physical address (DIRECTMAP with Xen code/data/heap )

Directory Offset

idle_pg_table

idle_pg_table

CR3

Physical Memory

Physical Address

CR0

PG

AM

ET

NE

WP

PE

MP

Base Address

CR4

PS

unmapped area (48MB)

unmapped are (4MB)0x3FF

PRW

ADPS

0x000=0x000

Page Directory Entry

0x3F0

PRW

ADPS

0x3FC=0x000

Direct Mapping [Xen code/data/heap] (12MB)

Start Physical Address (0x00000000)

End Physical Address (HYPERVISOR_VIRT_START, 0xFC000000)

Page 8: Intel® Virtualization Technology Xen Architecture Lv Zheng

8

Agenda

Xen’s Architecture Terminology & Concepts Bootstrap Process Physical Memory Management Virtual Memory Management Domains Virtualization

Page 9: Intel® Virtualization Technology Xen Architecture Lv Zheng

9

Xen Memory Layout

Overview Every OS’s address space at the top 64

MB of memory, to save a TLB flush. Xen’s address space

Entry: 0xFF000000 + 0x100000 PAGE_OFFSET: 0xFF000000

virt_to_maddr / __pa maddr_to_virt / __va

Memory Regions Direct Mapping: during early startup

phase, Xen initialized this area for its own code/data addressing.

Frame-info table: holding the frame tables for tracking machine pages.

Shadow & Guest linear page table: for guest’s linear address space virtualization. Available in guests.

Machine to physical mapping: holding the MFN -> GPFN translation.

IO Remapping: reserved area for ioremap

Per-domain mapping: reserved area for per-domain mapping

GDT/LDT tables: if guest want segmentation, it can register its own GDT/LDT tables here

mapping cache: map a page to Xen directly addressable area

IO Remapping (4MB)

Direct Mapping [Xen code/data/heap] (12MB)

Per-domain mapping (8MB)

Shadow linear page tables (4MB)

Guest linear page tables (4MB)

Machine-to-physical translation table [writable] (4MB)

Frame-info table (24MB)

Machine-to-physical translation table [read-only] (4MB)

Start Virtual Address (0xFC000000)

End Virtual Address (0xFFFFFFFF)

0xFFC00000

0xFF000000

0xFE800000

0xFE400000

0xFE000000

0xFDC00000

0xFC400000

0xFC000000

Page 10: Intel® Virtualization Technology Xen Architecture Lv Zheng

10

Domain Heap

Allocation Overview

Xen heap is up to physical address “xenheap_phys_end” (configurable, default is 0x00C00000=12MB)

RAM extent is determined by e820 map table. xenheap_phys_start then is determined.

Xen executable images extend xenheap_phys_start to “_end”

Modules (xen0 kernel & initial images) in multiboot table will be copied to the start position of xen heap.

init_boot_allocator: initialize area for early stage allocation with bitmap indicating allocation states.

init_boot_pages: allocation area allocable.

memguard_init: map xen heap pages at 4kB granularity, and protect xen heap with overflow detection.

init_frametable: initialize frame tables indicating page level allocation states

end_boot_allocator: MEMZONE_DMADOM & MEMZONE_DOM turned to be allocable.

init_xenheap_pages: MEMZONE_XEN turned to be allocable.

Xen heap (12MB)

Frame table (24MB)

Start Virtual Address (FRAMETABLE_VIRT_START)

Start Physical Address (0x00000000)

alloc_bitmap

pl1e[xen_heap]

pl1e[frame_table]

frame_table

Xen Heap (Final)

xenheap_phys_start

xenheap_phys_end

xenheap_phys_start

xenheap_phys_start

xenheap_phys_start

xen_image (_start ~ _end)

End Virtual Address (FRAMETABLE_VIRT_END)

initial_images

Frame-info Table

Page 11: Intel® Virtualization Technology Xen Architecture Lv Zheng

11

…… …

Allocable Physical Memory

Boot Allocation

Descriptions: Early boot stage allocation Allocation bitmap (alloc_bitmap):

Located in Xen heap area 0[white] means free 1[red] means allocated)

Boot allocation's allocable area: From: initial_images_end (physical

address > xenheap_phys_end) To: maximum of accessible

physical memory Intefaces:

Allocable area: alloc_boot_pages Xenheap area: xenheap_phys_start

+ allocatedi.e. alloc_xen_pagetable

Free Page (4KB)

……

……

Start Physical Address (0x00000000)

Maximum Physical Address

initial_images_end

xenheap_phys_end

Xen Heap

Initial Images

Allocable

alloc_bitmap

Allocated Page (4KB)

Page 12: Intel® Virtualization Technology Xen Architecture Lv Zheng

12

Frame Table

Address: FRAMETABLE_VIRT_START

Size: max_page * sizeof (page_info)

Align: PAGE_SIZE State: Red = allocated / White

= free Interfaces

alloc_heap_pages free_heap_pages [virt(xenheap) | maddr |

mfn]_to_page page_to_[virt(xenheap) |

maddr | mfn]

Allocable Physical Memory

max_pages

max_pages

tlbflush_timestamp type_info _domain count_info list

Maximum Physical Address

Start Physical Address (0x00000000)

tlbflush_timestamp cpumask order count_info list

frame_table

Xen Heap

frame_table

Domain Heap

Start Virtual Address (FRAMETABLE_VIRT_START)

End Virtual Address (FRAMETABLE_VIRT_END)

Used Area

Xen Heap

4KB

4KB …

tlbflush_timestamp type_info _domain count_info list

4KB

4KB

tlbflush_timestamp cpumask order count_info list

idle_pg_table_l2

PRW

AD0x3F1

Page 13: Intel® Virtualization Technology Xen Architecture Lv Zheng

13

Page Info Fields

count_info A(allocated): Cleared when the owning guest 'frees'

this page. OS(out-of-sync): Set when fullshadow mode marks a

page out-of-sync. PT(page-table): Set when fullshadow mode is using

a page as a page table. COUNT: 29-bit count of references to this frame.

type_info: TYPE: mutually exclusive types

PGT_l1_page_table PGT_l2_page_table PGT_gdt_page PGT_ldt_page PGT_writable_page

V(validated): Has this page been validated for use as its current type?

P(pinned): Owning guest has pinned this page to its current type?

VA: The 11 most significant bits of virt address if this is a page table.

COUNT: 16-bit count of uses of this frame as its current type.

list Each frame can be threaded onto a doubly-linked

list. Link to xenpage_list of <struct domain>: xen pages

shared with this domain. share_xen_page_with_guest

Link to page_list of <struct domain>: pages belonging to this domain. alloc_domain_pages

_domain owner of this page

order order-size of the free chunk this page is the head of

COUNTPV VATYPE

PT

OS

COUNTA

type_info

count_info

_domain list page_list_domain list

<domain><page_info><page_info>

_domain list xenpage_list_domain list

<domain><page_info><page_info>

Page 14: Intel® Virtualization Technology Xen Architecture Lv Zheng

14

Whole Physical Memory

Memory Zone

Attached into a global list for allocation static struct list_head heap[NR_ZONES]

[MAX_ORDER+1]; MEMZONE_XEN:

Safe in IRQ context always be freed by free_xenheap_pages or

xfree explicitly alloc_xenheap_pages

allocate pages in Xen heap Need not map to idle_pg_table_l2

free_xenheap_pages xmalloc / xfree

O(n) power of 2 free list allocator for arbitrary chunks allocation

MEMZONE_DMADOM & MEMZONE_DOM: Not safe in IRQ context

_DOMF_dying: it cares about the secrecy of pages’ contents

Will be scrubbed in softirq context pfn_dom_zone_type

distinguish between these 2 zones alloc_domheap_pages

try to allocate pages in MEMZONE_DOM try MEMZONE_DMADOM if MEMZONE_DOM

failed must remap to idle_pg_table_l2

free_domheap_pages Will check whether _DOMF_dying is set, if

so, page will be added to page_scrub_list, which will be freed in later softirq context

avail_domheap_pages get available page number for allocation

heap

MAX_ORDER

1

0

MAX_ORDER

1

0

MAX_ORDER

1

0

Xen Heap (xenheap_phys_end)

DMA Addressable Domain AreaDMA_BITS = 31 (0x80000000)

Domain Area

Maximum Physical Address

Start Physical Address (0x00000000)

Page 15: Intel® Virtualization Technology Xen Architecture Lv Zheng

15

Xen Heap

memguard_guard_range

memguard_guard_range

memguard_guard_range

Memory Guard

Descriptions Memory guard is used for xen

heap memory (Xen’s source codes will use this region) overflow detection

If a range is guarded, memory overflow will lead to a page fault exception (#PF)

Operations Init: Splitter 4MB mapped Xen

heap pages into 4KB mapping Guard: set pages to be present,

page is not in use Unguard: set pages to be not

present, page is in use Interfaces

memguard_guard_range, used in such functions:

init_xenheap_pages free_xenheap_pages

memguard_unguard_range, used in such functions:

alloc_xenheap_pages

memguard_unguard_range

memguard_unguard_range

memguard_guard_range

memguard_guard_range

memguard_unguard_range

……

memguard_guard_range

memguard_guard_range

#PF

Start Physical Address (xenheap_phys_start)

End Physical Address (xenheap_phys_end)

Page 16: Intel® Virtualization Technology Xen Architecture Lv Zheng

16

Agenda

Xen’s Architecture Terminology & Concepts Bootstrap Process Physical Memory Management Virtual Memory Management Domains Virtualization

Page 17: Intel® Virtualization Technology Xen Architecture Lv Zheng

17

Hypervisor Entries (0x010)

Domain Entries (0x3F0)

Xen Heap

Low Mapping (4GB-64MB)

Frame-info table (24MB)

Paging Overview

Entries L2_PAGETABLE_ENTRIES: 0x400 DOMAIN_ENTRIES_PER_L2_PAGETABLE: 0x3f0 HYPERVISOR_ENTRIES_PER_L2_PAGETABLE:

0x010 start_paging

Xen heap mapping Low mapping

memguard_init init_frametable domain_create (idle domain)

arch_domain_create paging_init

Machine to physical mapping IO remapping Per-domain page table

construct_dom0 Zap_low_mapping (idle_pg_table_l2)

Direct mapping (12MB)

idle_pg_table_l2

PRW

ADPS

0x3F7 G

PADPS

0x3F0 G

PRW

AD0x3F1

PRW

AD0x3FA

PRW

AD0x3FB

PRW

ADPS

0x000=0x000

PRW

ADPS

0x3FC=0x000 PRW

AD0x3FC=0x000

PRW

AD0x3FF

Xen heap L1 table (12KB)

Frame info L1 table (4KB * needed)

Per-domain GDT / LDT L1 table (4KB)

Per-domain mapping cache L1 table (4KB)

Xen Heap

Domain Heap

MPT mapping (4MB)

IO remap L1 table (4KB)

Start Physical Address (xenheap_phys_end)

Maximum Physical Address

Frame info table (<=24MB)

Page 18: Intel® Virtualization Technology Xen Architecture Lv Zheng

18

Per-domain L1 Page Table

Per-domain mapping (8MB)

Per-domain Page Table

L1_ENTRIES: arch_domain_create Size: 0x0800, 2 pages Location: Xen heap Contents: entries pointing to pages

L2_ENTRIES: construct_dom0 Size: 0x0002 Location: idle_pg_table_l2 Contents: entries pointing to

mm_perdomain_pt Per Domain

mm_perdomain_pt in <struct domain>

4KB L1 table for per-VCPU GDT/LDTs (4MB)

4KB L1 table for mapping cache (4MB)

Direct Mapping [Xen heap] (12MB)

End Virtual Address (PERDOMAIN_VIRT_END)

Start Physical Address (0x00000000)

Start Virtual Address (PERDOMAIN_VIRT_START)

End Physical Address (xenheap_phys_end)

mapcache.l1tab (0x400 ~ 7FF)

mm_perdomain_pt

idle_pg_table_l2

PRW

AD0x3FA

PRW

AD0x3FB

Per-domain GDT/LDTs (0x000 ~ 0x3FF)

Page 19: Intel® Virtualization Technology Xen Architecture Lv Zheng

19

Per-domain L1 Page Table

VCPU[0] per-domain page tables

Per-domain mapping (8MB)

GDT/LDT Tables

Location (per virtual CPU) perdomain_ptes in <struct vcpu>

Entries PDPT_L2_ENTRIES:

construct_dom0 Size: 0x0002 Location: idle_pg_table_l2 Contents: entries pointing to

mm_perdomain_pt PDPT_L1_ENTRIES:

arch_domain_create Size: 0x0800, 2 pages Location: Xen heap Contents: entries pointing to

pages Interfaces

GDT map / unmap set_gdt destroy_gdt

LDT map / unmap map_ldt_shadow_page invalidate_shadow_ldt

Direct Mapping [Xen heap] (12MB)

End Virtual Address (PERDOMAIN_VIRT_END)

Start Physical Address (0x00000000)

Start Virtual Address (PERDOMAIN_VIRT_START)

End Physical Address (xenheap_phys_end)

Per-domain GDT/LDTS(FIRST_RESERVED_GDT_PAGE = 14 pages)

gdt_table (0x00E)

VCPU[MAX_VIRT_CPUS] per-domain page tables

mm_perdomain_pt (0x000 ~ 0x3FF)

1<<GDT_LDT_VCPU_SHIFT

PDPT_L1_ENTRIES

Per-domain reservation

gdt_table (0x3EE)

Per-domain GDT/LDTs(FIRST_RESERVED_GDT_PAGE = 14 pages)

Per-domain reservation

Page 20: Intel® Virtualization Technology Xen Architecture Lv Zheng

20

Mapping Cache

Why? maddr_to_virt can only work for xenheap region in

idle_pg_table address space, so if we had a mfn or maddr to access, we could hardly translate it to a virtual address.

CONFIG_DOMAIN_PAGE indicates whether current arch is OK for using map_domain_page interfaces

How? Map domain pages to the mapping cache area which

is addressable in idle_pg_table address space Hash bitmap is used for acceleration 4MB can cache 1K pages mapping Install mapping into mapcache.l1tab

(mm_perdomain_pt + 0x400) Hash

Not in-use: MAPHASHENT_NOTINUSE = (u16)~0U In-use: idx from MAPCACHE_VIRT_START

Interfaces Per-VCPU mappings

map_domain_page unmap_domain_page

Accessible in all address spaces, can also be unmapped from any context.

map_domain_page_global unmap_domain_page_global

mapcache.vcpu_maphash

0

MAX_VIRT_CPU

Per-domain mapping (8MB)

End Virtual Address (PERDOMAIN_VIRT_END)

DOM Heap Pages

Start Virtual Address (MAPCACHE_VIRT_START)

MFN=0xC03

MFN=0x100E

mapcache.l1tab (0x0400 ~ 0x7FF)

mm_perdomain_pt

pfn idx refcount

Page 21: Intel® Virtualization Technology Xen Architecture Lv Zheng

21

Machine to Physical Mapping

MEMZONE_DOM

Machine-to-physical translation table [writable] (4MB)

Machine-to-physical translation table [read-only] (4MB)

End Virtual Address (RO_MPT_VIRT_END)

Start Physical Address (xenheap_phys_end)

idle_pg_table

PRW

ADPSE

0x3F7

PADPSE

0x3F0

Start Virtual Address (RDWR_MPT_VIRT_START)

Maximum Physical Address

MEMZONE_DMADOM

End Virtual Address (RDWR_MPT_VIRT_END)

Start Virtual Address (RO_MPT_VIRT_START)

Description Records the mapping from

machine page frames to pseudo-physical ones.

mpt_size Size: max_page *

BYTES_PER_LONG (max=4MB) Align: 1<<L2_PAGETABLE_SHIFT

(4MB) Location: anonymous domain

(domain == NULL) heap pages machine_to_phys_mapping

Address RDWR_MPT_VIRT_START RO_MPT_VIRT _START

Stores GPFN: guest pseudo-physical fra

me number INVALID_M2P_ENTRY: ~0UL 0x55555555 on initialization

Interfaces set_gpfn_from_mfn get_gpfn_from_mfn

0x55555555 0x55555555 …… ……

Page 22: Intel® Virtualization Technology Xen Architecture Lv Zheng

22

Page Table (4KB)

Paging

L2 entry: idle_pg_table_l2[l2_linear_offset(virt)], virt_to_xen_l2e(virt) L1 entry: l2e_to_l1e(l2e) + l1_table_offset(virt) map_pages_to_xen: map a physical page to idle_pg_table, specify MAP_SMALL_PAGES for 4KB mappings

PRW

ADpl2e page frame no 0x000

PRW

ADpl2e page frame no 0x3FF

L1 Page Table

Page Table Entry (pl1e)

Page Directory (4KB)

PRW

ADpl1e base address

L2 Page Table (idle_pg_table)

Page Directory Entry (pl2e)

PRW

ADpl1e base address

Physical Memory

Physical Address

Base Address

Directory OffsetTable

idle_pg_table

CR3 CR4

PS

Page 23: Intel® Virtualization Technology Xen Architecture Lv Zheng

23

Agenda

Xen’s Architecture Terminology & Concepts Bootstrap Process Physical Memory Management Virtual Memory Management Guest Memory Management Interrupt/Exception Handling

Page 24: Intel® Virtualization Technology Xen Architecture Lv Zheng

24

ACPI data

Reserved Domains

DOMID_IO: This domain owns I/O pages that are within the range of the page_info array.

First 1MB of RAM is historically marked as I/O.

Any areas not specified as RAM by the e820 map are considered I/O.

DOMID_XEN: Any Xen-heap pages that we will allow to be mapped will have their domain field set to this domain.

M2P table is mappable read-only by privileged domains.

Xen trace buffer is shared for xentrace.

IDLE_DOMAIN_ID: The idle domain will perform as an idle task after initialization is done.

No pages belonging to this domain.

usable

reserved

usable

ACPI NVS

reserved

Maximum Physical Address

Start Physical Address (0x00000000)

Historical 1MB IO

Reserved Area IO

M2P Table

Trace Buffer

Page 25: Intel® Virtualization Technology Xen Architecture Lv Zheng

25

To Be Allocated Pages

Domain0 Memory

Dom0 Memory Layout

nr_pages 1/16th of available memory for things

like DMA buffers. Maximum of 128MB. Specifiable.

Align v_end: 4MB alignment Other: 4KB alignment

Regions Loaded kernel: parse elf image load it

to the correct start address (for linux 0xC0000000).

Initial images: copy initial images after kernel.

Physical to machine mapping: alloc required pages for P2M mapping for pseudo physical address virtualization.

Start info: stores start information for domain in this page.

Page tables: L2 (1 page) & L1 page tables for this mapping. nr_pages(vpt) > l1 & l2 page tables * PAGE_SIZE

Boot stack: reserved for bootup stack v_end: reserved at least 512KB.

Kernel Image

Start Virtual Address (domain_setup_info.v_start)

Init. ramdisk

Physical to Machine Mapping

Start Info (1 page)

Page Tables (nr_pt_pages)

Boot Stack

End Virtual Address (v_end)

nr_pages

Page 26: Intel® Virtualization Technology Xen Architecture Lv Zheng

26

Linear Page Table

Steps Allocate required page table pages

L2: 1 page, set as PGT_l2_page_table L1: no, of pages containing entries can cover

(v_end - v_start), set as PGT_l1_page_table type_info of page frame is set to be

PGT_writable_page Copy idle_pg_table_l2 Set LINEAR_PT_VIRT_START entry pointing to

page table itself Overwrite PERDOMAIN_VIRT_START entries Set v_start ~ v_end page tables

Fields All page table pages marked as read only to

ensure not to be modified by guest OS Page Tables

guest_table, pfn of l2 entry offset 0x3f8

Page Tables vpt_start ~ vpt_end (nr_pt_pages)

Frame-info table (24MB)

Direct mapping (12MB)

idle_pg_table_l2

PRW

ADPS

0x3F7 G

PRW

AD0x3F1

PRW

AD0x3FA

PRW

AD0x3FB

PRW

AD0x3FC=0x000

PRW

AD0x3FF

l1start (4KB)

l1start+1 (4KB)

PAD0x2fc

PAD0x2fd

guest_table

PRW

AD0x3F8

PADPS

0x3F0 G

Start Virtual Address (linux entry point 0xC0000000)

Page 27: Intel® Virtualization Technology Xen Architecture Lv Zheng

27

Physical to Machine Mapping

Description Records the mapping

from pseudo-physical frames to machine page ones.

Address: vpt_start ~ vpt_end

construct_dom0 Allocate all domain0

reservation Fill P2M & M2P

mappings vphysmap_start[pfn] =

mfn; set_gpfn_from_mfn(mfn

, pfn);

Reserved Area

Start Virtual Address (domain_setup_info.v_start)

Startup Area

End Virtual Address (v_end)

Physical to Machine Mapping

nr_pages

Machine-to-physical translation table [writable] (4MB)

Machine-to-physical translation table [read-only] (4MB)

End Virtual Address (RO_MPT_VIRT_END)

Start Virtual Address (RDWR_MPT_VIRT_START)

End Virtual Address (RDWR_MPT_VIRT_END)

Start Virtual Address (RO_MPT_VIRT_START)

Page 28: Intel® Virtualization Technology Xen Architecture Lv Zheng

28

Start Info

Page 29: Intel® Virtualization Technology Xen Architecture Lv Zheng

29

DomU Builder

Similar to dom0 builder. Can be called from domain 0 (control domain) in

libxc: xc_linux_build.

Page 30: Intel® Virtualization Technology Xen Architecture Lv Zheng

30

Writable Pages

Page 31: Intel® Virtualization Technology Xen Architecture Lv Zheng

31

Shadow Page Table

Page 32: Intel® Virtualization Technology Xen Architecture Lv Zheng

32

Grant Table

Page 33: Intel® Virtualization Technology Xen Architecture Lv Zheng

33

Agenda

Xen’s Architecture Terminology & Concepts Bootstrap Process Physical Memory Management Virtual Memory Management Guest Memory Management Interrupt/Exception Handling

Page 34: Intel® Virtualization Technology Xen Architecture Lv Zheng

34

Interrupt Table

Stores Address: idt_table Size: IDT_ENTRIES (256)

Fields DPL: privilege level Offset: trap function address Segment Selector: use hypervisor code

segment (D = 1) means size of (gate = 32)

Entries Gray: DPL=0 (hypervisor traps) Blue: DPL=3 (system wide gate) Green: DPL=1 (allow kernel trap in hypervisor) Yellow: DPL=0 type=task gate Pink: per-CPU interrupt gate

Interfaces: Gray: set_intr_gate Blue: set_system_gate Yellow: set_task_gate Others: _set_gate

0 TRAP_divide_error divide_error

1 TRAP_debug debug

2 TRAP_nmi nmi

3 TRAP_int3 int3

4 TRAP_overflow overflow

5 TRAP_bounds bounds

6 TRAP_invalid_op invalid_op

7 TRAP_no_device device_not_available

8 TRAP_double_fault __DOUBLEFAULT_TSS_ENTRY

9 TRAP_copro_seg coprocessor_segment_overrun

10 TRAP_invalid_tss invalid_TSS

11 TRAP_no_segment segment_not_present

12 TRAP_stack_error stack_segment

13 TRAP_gp_fault general_protection

14 TRAP_page_fault page_fault

15 TRAP_spurious_int spurious_interrupt_bug

16 TRAP_copro_error coprocessor_error

17 TRAP_alignment_check alignment_check

18 TRAP_machine_check machine_check

19 TRAP_simd_error simd_coprocessor_error

31 TRAP_deferred_nmi deferred_nmi

0x82 HYPERCALL_VECTOR hypercallidt_table

Base Address Low Limit (256*8-1)

IDTR

Base Address High

IDT_ENTRIES (256)

D

Segment Selector(__HYPERVISOR_CS << 16)

Offset High

Offset Low

P DPL

Fields <-

Page 35: Intel® Virtualization Technology Xen Architecture Lv Zheng

35

Exception Handler

Allocate structure on stack Processor saves specific registers Error code

May be stored by processor on specific exception

Entry vector store by exception handler

Store other registers by ‘error_code’ codes

If CS==3, exception handler saves selectors

Stack top (functions’ parameter) points cpu_user_regs

EBX points to current vcpu

ebx

ecx

edx

esi

edi

ebp

eax

error_code entry_vector

upcall_mask

eip

cs _pad0

eflags

esp

ss _pad1

es _pad2

ds _pad3

fs _pad4

gs _pad5

cpu_user_regs

Stack Top

ebx

STACK

struct vcpu

cpu_user_regs

Start Virtual Address (vcpu[n] stack)

Page 36: Intel® Virtualization Technology Xen Architecture Lv Zheng

36

Hyper Call

Page 37: Intel® Virtualization Technology Xen Architecture Lv Zheng

37

Event Channel

Page 38: Intel® Virtualization Technology Xen Architecture Lv Zheng

38

Virtual IDT

Description Virtual IDT propagates exceptions to the guest

OS. Location

Stored in the vcpu’s guest context field Init values are in global trap_table Stored in the vcpu’s guest context through

hypercall Size

Maximum to 256 entries Sequences

Initialize CS field for guest as FLAT_KERNEL_CS

construct_dom0 Set init vcpu’s virtual IDT handlers

(linux)trap_init (linux)HYPERVISOR_set_trap_table do_set_trap_table (NULL: clear entire virtual

IDT / Not NULL: set virtual IDT) Initialize trap_ctxt for every virtual CPU

(linux)smp_trap_init (linux)HYPERVISOR_vcpu_op(VCPUOP_initia

lise) do_vcpu_op(VCPUOP_initialise): copy virtual

IDT Restrict code selector for guest virtual IDT

fixup_guest_code_selector Propagate exceptions

do_xxx (trap handlers): turn safe exception tor trap_bounce

create_bounce_frame: create a basic exception frame on guest OS (RING-1) stack

flagscs vector

address

flagscs vector

address

trap_table

256

Page 39: Intel® Virtualization Technology Xen Architecture Lv Zheng

39

Dynamic IRQ

Page 40: Intel® Virtualization Technology Xen Architecture Lv Zheng

40

Current Task

Page 41: Intel® Virtualization Technology Xen Architecture Lv Zheng

41

Range Set

Used for following system resources’ management Interrupts IO Ports IO Memory