24
Hardware-Assisted Mediated Pass-Through with VFIO Kevin Tian Principal Engineer, Intel

Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

1

Hardware-Assisted Mediated Pass-Through with VFIO

Kevin TianPrincipal Engineer, Intel

Page 2: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

2

Legal Disclaimer

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others

© Intel Corporation.

Page 3: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

3

VFIO

• A secure, userspace driver framework

• VFIO physical device PCI endpoints, platform devices, etc.

PCI device sharing through PCIe® Single Root I/O Virtualization (SR-IOV)

• VFIO mediated device vGPUs, channel I/O devices, crypto APs, etc.

Device sharing through vendor specific resource mediation policy

Page 4: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

4

PCIe® SR-IOV

Device

Backend Resources

VF1 VFn

Q …

PF

VM Application

Q Q Q Q Q …Q Q Q

• Hardware-assisted I/O virtualization Physical Function (PF)

Virtual Function (VF)

• Pros Software simplicity

IOMMU-based DMA isolation

• Cons Limited scalability

Fixed resource allocation

Lack of composabilityIOMMU

DMA (BDF)

BDF BDF

BDF

Page 5: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

5

Mediated Device

Device

Software Resource/DMA Remapping Logic

mdev mdev mdev…

…Q …Q Q Q Q Q…Q Q Q

• Mediated pass-through architecture Slow-path operations emulated by

software

Fast-path operations passed through

• Pros Flexible resource allocation

Composability

• Cons Software-based DMA isolation (thus

increases complexity and limits scalability)

…Q Q Q

BDF

VMApplication

IOMMU

DMA (BDF)

Page 6: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

6

Can we enjoy merits from both sides for hyper-scaled usage?

Page 7: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

7

Process

Goals

Finer-grained resource sharing

Q Q Q Q Q Q Q Q

Device

VM1 VM2 VMn…

Flexible Resource Allocation

Q Q Q Q Q Q Q Q

Device

Application VM1 VMn

Composability

Q Q Q Q

Device

Q Q Q

Device

VM VM

Live Migration

Hardware-enforced DMA Isolation

Memory

Page 8: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

8

Assignable Device Interfaces (ADI)

Device

PF BAR

PF Config

ADI ADI ADI…

…Q …Q Q Q Q Q …Q Q Q

PASID PASID

• ADI - represents minimal sharable resource Queues, queue pairs, contexts

Enumerated through DVSEC capability (see backup)

• Meets isolation criteria to be ‘assignable’ Functional isolation between ADIs

ADI MMIO in separate system page size regions

Independently resettable

Interrupt Message Storage (IMS)

• Tags DMA with unique PASIDs

Enable maximum possible scalability!

(BDF:PASID) DMABDF

Page 9: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

9

PASID-granular DMA Isolation

• Moves all IOMMU paging structures to per-PASID 1st level translation

2nd level translation

Nested translation

Pass-through translation

• Enables PASID-granular DMA isolation

• Supports all existing address translation usages IOVA, VA, GPA, GIOVA and GVA

IOMMU

Device

(BDF:PASID)

BDF PASID

DMA

L4 L3 L2 L1

L4 L3 L2 L1

L4L4 L3L3 L2L2 L1L1

L4L4 L3L3 L2L2 L1L1

Nesting

1st level

2nd level

2nd level

Scalable mode

Two-level structure(see backup)

1st level

Enable hardware-enforced DMA isolation!

Page 10: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

10

Software Composition

Device

PF BAR

PF Config

ADI ADI ADI…

…Q …Q Q Q Q Q …Q Q Q

PASID PASID

Software Resource Remapping Logic

mdev mdev mdev…

VM/Application

• Virtual Device Composition Module (VDCM) Software managed resource

remapping between mdev and ADI

Composes ADIs into mediated device (mdev)

• Leverage VFIO mdevframework for Managing mdev life-cycle

Setting up access policy on mdevresources

Serving slow-path operations from guest

DMA (BDF:PASID)

IOMMU

VDCM

Enables great flexibility and composability!

BDF

VFIO

Page 11: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

11

Combining them together…

Page 12: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

12

Intel® Scalable I/O Virtualization(Intel® Scalable IOV)

• A hardware-assisted mediated pass-through architecture Device: supports Assignable Device Interfaces (ADIs)

Platform: extends Intel® VT-d with PASID-granular DMA isolation (scalable mode)

Software: moves infrequent (slow-path) accesses from device to software

• Supports any type of devices e.g. NIC, storage, GPU, accelerators, … (integrated or discrete)

• Supports both VM and bare-metal usages

Page 13: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

13

Documentation

• Intel® VT-d specification update (Rev 3.0) Documents Intel® VT-d (IOMMU) support for PASID granular

address translation

• Intel® Scalable I/O Virtualization Technical Specification (Rev 1.0) Documents the Scalable IOV architecture blueprint and operation,

including DVSEC

Addresses architecture requirements for devices and drivers

Agnostic of type of device or specific implementation

Openly published to enable broad device and software ecosystem

• https://software.intel.com/en-us/articles/intel-sdm

Page 14: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

14

What does it mean to VFIO/IOMMU driver?

Page 15: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

15

VFIO: IOMMU-capable Mdev

IOMMUDriver

VFIO

platform

User Interfaces

pci mdev

IOMMU Interfaces

Host Driver

VDCM

Map/unmap

• IOMMU-capable mdev

Allow IOMMU operations on mdev

Opt-in by VDCM

• Finer-grained resource management

New aggregated type to compose any number of ADIs into a mdev

Mdev Core

Mdev Bus

callbacks

Libvirt, Qemu, etc.

Page 16: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

16

IOMMU Domains

L4 L3 L2 L1

L4 L3 L2 L1

2nd level page table

L4 L3 L2 L1…

1st level page tables

iommu_domain • One device can attach to one domain, at any time One domain can be attached to

multiple devices

• Domain type describes IOMMU policy for device DMAs DMA, UNMANAGED, IDENTITY,

and BLOCKED

• Domain is switched when policy changesL4 L3 L2 L1

L4 L3 L2 L1

2nd level page table

L4 L3 L2 L1…

1st level page tables

iommu_domainDevice

Device

Device

“one domain” scheme doesn’t work for IOMMU-capable mdev!

Page 17: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

17

L4 L3 L2 L1

L4 L3 L2 L1

2nd level

L4 L3 L2 L1…

1st level

Auxiliary Domain

L4 L3 L2 L1

L4 L3 L2 L1

2nd level page table

L4 L3 L2 L1…

1st level page tables

iommu_domain • One device can attach to multiple domains A primary domain used for DMA-API

Multiple auxillary domains used for mdev instances

• ‘aux’ is a device attribute instead of domain attribute Same domain may represent as

‘primary’ to deviceA and ‘aux’ to deviceB

‘primary’ vs. ‘aux’ is decided at domain attach time

Device driver enables ‘aux’ capability on device before attaching domain

• No change to at(de)tach API VFIO attaches domain to mdev’s

parent

L4 L3 L2 L1

L4 L3 L2 L1

2nd level page table

L4 L3 L2 L1…

1st level page tables

iommu_domainDevice

Device

‘primary’

‘aux’

‘aux’

‘primary’

Device‘primary’

Avoid mdev awareness in IOMMU layer!

Page 18: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

18

Mdev with vIOMMU

• Ongoing effort to enable VFIO devices with vIOMMU Shadow vIOMMU 2nd level usages (e.g. GIOVA)

Nesting vIOMMU 1st level usages (e.g. GIOVA/GVA) Including system-wide PASID management

Cache invalidation forwarding (when nesting is in-use)

Page request/response handling (for guest SVA)

• Expect common user-space logic for vfio-pci and vfio-mdev Just granularity difference handled within IOMMU driver

• Qemu: emulating new VT-d scalable mode emulation

• For more detail, join below session by Yi & Jacob! “Shared Virtual Addressing in KVM”

Page 19: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

19

Status

• Key developers Baolu (Allen) Lu ([email protected])

Yi Liu ([email protected])

Jacob Pan ([email protected])

• RFC patch progress https://lkml.org/lkml/2018/10/7/54 for scalable mode support in intel-iommu

driver in v3

https://lkml.org/lkml/2018/10/12/225 for aux_domain and IOMMU capable mdev in VFIO/IOMMU driver in v3

Continued hot discussion around vIOMMU/vSVM (in multiple threads)

Mdev requirement is being considered gradually

https://www.mail-archive.com/[email protected]/msg173811.html for aggregated mdev type in v3

Page 20: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

20

Q/A

Page 21: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

21

Backup

Page 22: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

22

Enumeration of Intel® Scalable IOV Capability

• Designated Vendor Specific Extended Capability (DVSEC) to discover Intel® Scalable IOV capability A simplified subset of SR-IOV capability

ByteOffset

00h

04h

08h

0Ch

10h

Next Capability OffsetCap

VersionPCI Express Extended Capability ID = 0x23

0

DVSEC Length = 0x18DVSEC rev = 0

DVSEC Vendor ID = 8086

Flags (RO)Function Dependency

Link (RO)DVSEC ID for Scalable IOV = XXX

Supported Page Sizes (RO)

System Page Size (RW)

15161920232431

Capabilities (RO) 14h

Page 23: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

23

VT-d Extended Context Mode (Deprecated)

Page 24: Hardware-Assisted Mediated Pass-Through with VFIO · 2019-12-21 · 3 VFIO • A secure, userspace driver framework • VFIO physical device PCI endpoints, platform devices, etc

24

VT-d Scalable Mode (New)

Key Difference: PASID is a global ID space shared by all VMs.ALL page-table pointers moved to PASID Granular table