30
PCI Passthrough and GICv3-ITS in Xen ARM Manish Jaggi Vijaya Kumar Kilari Cavium, Inc. Proprietary + Demo on Dual Socket 48x2 Core ARMv8 Board

PCI Passthrough and ITS Support in Xen / ARM :Xen Dev Summit 2015 Presentation

Embed Size (px)

Citation preview

PCI Passthrough and GICv3-ITS in Xen ARM

Manish Jaggi

Vijaya Kumar KilariCavium, Inc. Proprietary

+ Demo on Dual Socket 48x2 Core ARMv8 Board

Page 2©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Agenda

Status of Xen Support from Cavium Top Level Architecture Additions in xen for pci-passthrough ITS architecture

– ARM specification– Virtual ITS driver in Xen

Xen NUMA Demo on Cavium ThunderX platform Questions

Page 3©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Status of Xen Support from Cavium

Xen 4.5+ (Current)– Demoed in Linaro Connect – Initial Support NUMA

Xen 4.6– Basic ThunderX platform support– Gicv3 Support.

Xen 4.7– vITS support– PCI Passthrough patches in Xen and Linux– NUMA Patches

Page 4©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Linaro Connect – DemoXen running on single socket 48 core - ThunderX

Page 5©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

ThunderX System Dual Socket Reference Platform

x2

Standard Industry Form Factor: ½ SSI Motherboard 2U 19” Rack Mount Chassis

Volume Server I/O: PCIe Gen3 10Gb or 40Gb Ethernet Integrated SATA

Up to 128GB Memory

Full Systems Management w/ BMC and IPMI

http://cavium.com/pdfFiles/ThunderX_CRB_2S_Rev1.pdf

Page 6©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Xen NUMArunning on dual socket 48x2 cores

vCPU

dom0 domU

vITSvCPU

vITS

domU

vCPUvITS

Xen Hypervisor

DDR DDR

Node 0 (48 Cores) Node 1 (48 Cores)

Page 7©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Top Level Architecture

R/WMSI/X

vCPU

IO Virtualizaton with System MMU

dom0 domU

vITSvCPU

vITS

PCIe HostBridge

GICv3 ITS

DDR Controller

DDR

Xen Virtual ITS Driver

PCIe-EP1 PCIe-EP2

(DeviceID,MSI_Index)=>LPI

Interrupt Translation Table StreamID => ContextBank

ContextBank = {…, Domain PageTable, … }

vLPIvLPI

}

Page 8©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

additions in xen-arm… (proposed / implemented)

PCIe HostController Support in Xen.– pci_conf_read/write calls handled by host controller driver– device_tree based

vITS Emulation Support Hypercall to map Linux SegmentID to appropriate PCI HostController xl-toolstack additions

– Mapping of GITS_ITRANSLATER space in domain– assign_device hypercall enhanced to support vDeviceID

Frontend-Backend Changes– no communication for MSI.– Front-end PCI bus msi-parent => its node in guest device tree

SMMU additions

Page 9©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

PCIe Host Controller support in Xen

The init function in the pci host driver calls to register hostbridge callbacks:int pci_hostbridge_register(pci_hostbridge_t *pcihb);

struct pci_hostbridge_ops {     u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,u32 reg, u32 bytes);     void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,

u32 reg, u32 bytes, u32 val);};

struct pci_hostbridge{     u32 segno;     paddr_t cfg_base;     paddr_t cfg_size;     struct dt_device_node *dt_node;     struct pci_hostbridge_ops ops;     struct list_head list;};

Page 10©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

PHYSDEVOP_pci_host_bridge_add

#define PHYSDEVOP_pci_host_bridge_add    44struct physdev_pci_host_bridge_add {    /* IN */    uint16_t seg;    uint64_t cfg_base;    uint64_t cfg_size;};

This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add hypercall. The handler code invokes … to update segment number in pci_hostbridge:

int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size);

Page 11©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

xl toolstack additions - DOMCTL

For domU, while creating the domain, the toolstack reads the IPA from themacro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA isread from a new hypercall which returns the PA of the GITS_ITRANSLATER_SPACE.Subsequently the toolstack sends a hypercall to create a stage 2 mapping.

Hypercall Details: XEN_DOMCTL_get_itranslater_space

/* XEN_DOMCTL_get_itranslater_space */struct xen_domctl_get_itranslater_space {    /* OUT variables. */    uint64_aligned_t start_addr;    uint64_aligned_t size;};typedef struct xen_domctl_get_itranslater_space xen_domctl_get_itranslater_space;DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;

Page 12©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

xl toolstack additions – device assignmentReserved Areas in guest memory spaceParts of the guest address space is reserved for mapping assigned pci device’s BAR regions. Toolstack is responsible for allocating ranges from this area and creating stage 2 mapping for the domain.This area is defined in public/arch-arm.h

/* For 32bit BARs*/ #define GUEST_BAR_BASE_32 <<>> #define GUEST_BAR_SIZE_32 <<>> /* For 64bit BARs*/ #define GUEST_BAR_BASE_64 <<>> #define GUEST_BAR_SIZE_64 <<>>

New entries in xenstore for device BARs/local/domain/0/backend/pci/1/0vdev-N    BDF = ""    BAR-0-IPA = ""    BAR-0-PA = ""    BAR-0-SIZE = ""

... BAR-M-IPA = "" BAR-M-PA = "" BAR-M-SIZE = "”

Page 13©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Hypercall Modification (XEN_DOMCTL_assign_device)

struct xen_domctl_assign_device { uint32_t dev; /* XEN_DOMCTL_DEV_* */ union { struct { uint32_t machine_sbdf; /* machine PCI ID of assigned device */ uint32_t guest_sbdf; /* guest PCI ID of assigned device */ } pci; struct { uint32_t size; /* Length of the path */ XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */ } dt; } u; };

Page 14©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

SMMU Code additions

iommu_ops functions PHYSDEVOP_pci_add_device .add_device = arm_smmu_add_dom0_dev,

PHYSDEVOP_pci_remove_device .remove_device = arm_smmu_remove_device

Page 15©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Mapping between streamID - deviceID - pci sbdf - requesterID

For a simpler case all should be equal to BDF.

But there are some devices that use the different requester ID for DMA transactions

Suggestions How to handle this

Page 16©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

pci-frontend bus gicv3-its node binding for domU It is assumed that toolstack would generate a gicv3-its node in domU device

tree. As of now the ARM PCI passthrough design supports device assignment to

the guests which have gicv3-its support.

All the devices assigned to domU are enumerated on a PCI frontend bus. On this bus interrupt parent is set as gicv3-its for ARM systems.

As the gicv3-its is emulated in xen, all the access by domU driver is trapped. This helps configuration & direct injection of MSI(LPI) into the guest. Frontend-backend communication for MSI is no longer required.

Frontend-backend communication is required only for reading PCI configuration space by dom0 on behalf of domU.

Page 17©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

ITS

Interrupt Translation Service(ITS) is the specification from ARM to support PCI MSI(x).

MSI(x) are handled as Locality-specific Peripheral Interrupts (LPI) starting from IRQ number 8192.

LPIs are directly targeted to CPU. SW sends ITS command like MAPD, MAPVI, MOVI, INT, SYNC, INV to ITS

HW to prepare MSI(x) Translation. Command Completion notification using

– Polling– Interrupt notification by placing INT command

Page 18©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

ITS HW-SW Interaction

ITT Table

ITT Table

ITT Table

CWRITER

CREADER

BASER

Command Queue

ITS Commands

Device Table

ITS HW

LPI Configuration

Table

LPI Pending Table (per CPU)

CPUS

CPUS

SOFTWARE

Allocated by SW used by

HW

Allocated by SW used by both

SW Write ITSCMDs to Queue

HW reads ITS CMDs and configures

CPUS

CPUS

Legend:

Page 19©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Major challenges in virtualizing ITS ITS Commands should be processed in with minimal latency

without blocking VCPU for long duration All guests should get fair amount of time in processing guest ITS

commands Guest cannot put Xen in DoS by sending commands continuously

– Solution: Do not send Guest ITS commands to HW. Just emulate them. Processing global ITS commands like SYNC, INVALL etc., on

platforms with Multi-node ITS– Solution: One Virtual ITS per domain and Ignore guests SYNC,

INVALL, DISCARD commands

Page 20©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Major challenges in virtualizing ITS

Handling guest ITS emulation that uses INT command for completion notification– Solution: Xen injects back virtual LPI to guest

when INT command is emulated.

Page 21©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

ITS virtualization in XEN

ITS Virtualization– Command Queue virtualization– LPI configuration table virtualization– GITS registers virtualization

Page 22©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

XEN ITS InitializationXEN

Guest SOFTWARE

ITT Table

CWRITER

CREADER

BASER

Physical ITS command Queue

ITS Commands

Device Table

Memory Allocated by

XEN for ITS HW

(1) Guest sends PCI_DEVICE_ADD

_PHYSDEVOPS hyper call

(2) Allocates ITT table for the

device and sends MAPD command

to ITS HW

(3) Allocates LPIs (physical LPI) to

device and sends MAPVI commands

Page 23©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

ITS command Virtualization

ITT Table

ITT Table

ITT Table

CWRITER

CREADER

BASER

Virtual Command

Queue

ITS Commands

Device Table

XEN

Guest SOFTWARE

Memory Allocated by

GUEST

(2) Xen uses Guest’s Device and ITT table

memory to note-down Guest ITS command

information

(1) Traps to Xen on Guest update of command in Virtual Queue

Page 24©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

MAPD/MAPVI ITS command Virtualization

ITT Table

CWRITER

CREADER

BASER

MAPD Devi ID, ITT IPA, Size

Device Table

XEN

Guest SOFTWARE

Memory Allocated by

GUEST

(1) XEN reads MAPD finds out IPA of ITT table and Size for the

devid

ITT IPA (8 bytes)

Size(8 bytes)

Virtual Command

Queue

MAPVI Dev ID, vID, Collection

Collection ID

vLPI (vID)

(3) XEN reads MAPVI

Command

(2) Xen uses Guest’s Device and ITT table

memory to note-down ITT Table IPA and Size

(4) Xen uses Guest’s Device Table to find

Address of ITT for the device and updates ITT indexed by ID with vLPI

and Collection ID

Page 25©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

LPI Routing to Guest

ITT TableDevice Table

XEN

Guest SOFTWARE

Memory Allocated by

Guest

(2) Xen queries Device Table and gets ITT

table

(1) Xen receives

pLPI

ITT IPA (8 bytes)

Size(8 bytes)

vLPI (vID)

Collection ID

HW

(3) From ITT table, Xen get

Virtual LPI (vLPI)

(4) Xen Injects vLPI to Guest

Page 26©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

References:

vITS Design doc– http://xenbits.xen.org/people/ianc/vits/draftG.pdf

Patches ( 22 )– http://osdir.com/ml/general/2015-07/msg35182.html

PCI Pass through Design doc– http://www.gossamer-

threads.com/lists/xen/devel/394962

Page 27©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Xen Dual(Socket / Node) NUMA Demo

vCPU

dom0 domU

vITSvCPU

vITS

domU

vCPUvITS

Xen Hypervisor

DDR DDRNode 0 (48 Cores) Node 1 (48 Cores)

Page 28©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

#xl listName ID Mem VCPUs State Time(s)Domain-0 0 2048 8 r----- 128.9domu-node0 1 2048 4 -b---- 1.4domu-node1 2 2048 4 -b---- 0.6

#xl cpupool-listName CPUs Sched Active Domain countPool-node0 48 credit y 2Pool-node1 48 credit y 1

xl cpupool-list -cName CPU listPool-node0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47Pool-node1 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95

Page 29©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

Questions

Page 30©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.

MSI-x Routing (back up)