15
DPDK Summit China 2017

DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

DPDK Summit China 2017

Page 2: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

A BETTER VIRTIOTOWARDS NFV CLOUDVHOST DATAPATH ACCELERATION

2

Cunming LIANG, IntelXiao WANG, Intel

Page 3: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

LEGAL DISCLAIMER

3

• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.• Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,

fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

• This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

• © 2017 Intel Corporation. Intel, the Intel logo, Intel. Experience What’s Inside, and the Intel. Experience What’s Inside logo are trademarks of Intel. Corporation in the U.S. and/or other countries.

• *Other names and brands may be claimed as the property of others.• Copyright © 2017, Intel Corporation. All rights reserved.

Page 4: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

4

Agenda

Problems towards NFV Cloud New Model of Direct I/O vHost Data Path Acceleration

Under the Hood

DPDK High Level Design

HW Prerequisites

Live-migration for Stock VM

Remaining Challenge Status & WIP Key Takeaway

Page 5: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

CPU

virtio driver

virtio driver

CloudSoftware vSwitch

vHost vHost

PF driver

NIC

CPU

NFViAccelerator

NFVi-vSwitchSlow Path

VF driver

VFdriver

PF driver

CPU

virtio driver

virtio driver

NFViAccelerator

vHost vHostNFVi-vSwitchSlow Path

VF driver

VFdriver

PF driver

CPU

virtio driver

virtio driver

NFViAccelerator

vHost vHost

NFVi-vSwitchSlow Path

PF driver

DPDK China Summit 2017 Shanghai,

5

Problems towards NFV Cloud vswitch/virtio is well recognized by cloud networking

Accelerator is used to address higher performance

SR-IOV device pass-thru represents for fast I/O

Device specific VF lacks a few cloud characteristics

Zero-copy buffer swap costs unpredictable # of CPU

Other direct I/O approach besides device pass-thru?

Para-virtualized device w/ HW acceleration, how?

Unspecific AcceleratorSR-IOV Like Performance

Friendly Live-migrationStock VMs Support

Page 6: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

?vhost BE

vDPA for virtio

HW

VF ACC driver

virtio-netdriver

Ring/Intr/Doorbell pass-thru

virtioDP handler

Emulated virtio dev

DPDK China Summit 2017 Shanghai,

6

New Model of Direct I/O

Key Objective• Follow Spec.• SR-IOV like performance• Friendly Live-migration Support• Support stock VMs Good-enough pass-thru Para-virtualized device w/

accelerator DPDK will support both model 2017’Q2 Prototype Finished

VIRTIO Device Pass-thru

virtio VF pass-thru

HW

virtio-netdriver

Ring/Intr/Doorbell pass-thru

virtio VF device

pass-thru device

CSR, BARConf/Map

vHOST Data Path Acc.

Page 7: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

7

vDPA: Under the Hood Device emulated by QEMU

Decompose DP/CP on Backend DP: DMA, INTERRUPT, DOORBELL

CP: vhost Protocol, DP configure

IOVA Translation by IOMMU/ATS

PI/EPT Mapping for INT/DOORBELL

Selective DP Acceleration Engine

Available SW DP Fallback Compatible Live-Migration Minimum HW Prerequisites

QEMU

VM

vmcs

virtio driver

PIR

virtio Device Emulation

PCIe accelerator engine

Physical MemoryEPTP

IOMMU

IRQ

DMAR INTR

Solo Page for KICK,MMIO Directly via EPT

page mapping

KICK

virtio_handler

vrings

Configure Register

PIO/EPT Violation

Memory Access

PIO/MMIO Access

Interrupt

Daemon Servicevhost protocol

backend

Configure

ATS

w/ IOMMU

w/o IOMMU

DP

CP

Page 8: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

8

vDPA: DPDK High Level Design DPDK vhost-user library

CP-Protocol, communicate channel with QEMU

vdev Mgr., virtual device and resource management

DP-ACCs, vhost data path abstraction layer

DP-SW: SW vhost data path

DP-ACC engine providers drive the accelerators which can be either PCIe based or non-PCIe based

PMD and Port Representor Driver of DP-ACC can leverage DP-SW library to build SW vhost data path

IXGBEFVL

QEMU

VM

virtio drivers

virtio Device Emulation

HOST

IOMMU...VF

vhost-userCP-Protocol

vdev & resource mgr.

Vhost PMD Dev

PMD NIC 1:1 zerocopy

VF

Vhost PMD Dev

PMD NIC

DP

VF ...

HW Backend

SW Backend

VF VF

DP ACC driver

...

virtio handler

virtio handler

virtio handler

DP

DP ACC driver

SW vSwitchVMM

vhost-userDP-SW

vhost-userDP-ACCs

vhost-user library

Page 9: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

9

vDPA: HW Prerequisites Ring Layout Follows the virtio Spec. (MUST)

Ring Feature Capability Awareness (MUST)

R/W vring index status (MUST) BAR configure register: R/W 16bits index register (last_used_idx) per vring

last_used_idx is the HW internal status of used vring

Log dirty pages (MUST, note: will be addressed by Vt-d) BAR configure register

64bits register for log memory base address

64bits register for log memory size

1bit register to enable logging

Kick RARP: w/ VIRTIO_NET_F_GUEST_ANNOUNCE, no need for HW to trigger the RARP

Page 10: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

10

Compatible with SW backend Dirty Page Logging

VRING state report/restore

Kick RARP (alternative)

Be possible to transparentlyupgrade/live-migrate stock VMto a new platform w/ accelerator in the backend

Challenge remains for busoverhead of small size transaction for the dirty page logging

vDPA: Live-Migration Support

period loopIteratively tranfer all pages dirtied by guest

QEMU VHOST QEMU VHOSTHW

VHOST_USER_SET_LOG_BASE{fd, size}

{log_base, log_size}

IOMMU update forlog_base's iova

VHOST_USER_SET_FEATURESlog enable

VHOST_USER_GET_VRING_BASE

stop VF

{last_usd_idx}

VHOST_USER_SET_VRING_BASE

set{last_used_idx}

HW

All memory tranferred to Destination

read

loop each vring

the 1st vring

loop each vring

dirty used_ring bitmap

Log desc/bufer dirty page

transfer all remaining dirty pages and state information

Resume Guest

transfer state information

Pause Guest

Stage 1

Stage 2

Stage 3

Source Destination

Page 11: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

11

Reducing bus overhead for Logging dirty page PCIe based: coarse-grained logging

Ideally logging the Dirty bits in IOMMU (long term)

It’ not a problem for memory based accelerator

Reducing bus overhead for Ring manipulation VIRTIO v1.1 New Ring Layout [1][2]

Simple modeling shows lower bus overhead

Remaining Challenges: Bus Overhead

[1]: https://lists.oasis-open.org/archives/virtio-dev/201702/msg00010.html[2]: https://lists.oasis-open.org/archives/virtio-dev/201702/msg00035.html

Not in Perfect Stage, butmanageable !

Page 12: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

12

Status & Working in Progress 2017 Q1~Q2 PoC [DONE] 2017 Q2 shared in DPDK Monthly Virtio Community Call [DONE] 2017’Q2 Finish v1.1 experimental prototype in DPDK [1] [DONE] 2017 Q3 Feedback Collection from Early Trial [WIP] 2017 Q3/Q4 v1.1 ring layout optimization, proposal, PoC [WIP] 17.08/17.11 DPDK vDPA framework RFC patch [WIP] 17’Q4 QEMU patch for virtio direct I/O support [WIP]

INTR/Doorbell Mapping

17’Q4 Kernel RFC patch for vDPA

Para-virtualized device w/ HW acceleration is coming.Welcome on board!

[1]: http://dpdk.org/git/next/dpdk-next-virtio

Page 13: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

13

Acknowledgement

Zhihong Wang

Tiwei Bie

Jianfeng Tan

Heqing Zhu

Yuanhan Liu

Amnon Ilan

Franck Baudin

Martin Roberts

Dan Daly

Gerald Rogers

Roger Chien

Page 14: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

14

Key Takeaway

What is vDPA? -- vHost Data Path Acceleration New approach of Direct I/O: small granularity data path pass-thru Target to next-gen para-virtualized device w/ accelerator Key benefits

‘SR-IOV’ like performance w/ compatible live-migration support

Transparently upgrade stock VM to enhanced platform w/ very small set of HW prerequisites

Remaining challenges are manageable Welcome for any feedback/contribution

Page 15: DPDK Summit China 2017 · Network Platforms Group DPDK China Summit 2017 ,Shanghai 8 vDPA: DPDK High Level Design DPDK vhost-user library CP-Protocol, communicate channel with QEMU

Network Platforms Group

DPDK China Summit 2017 Shanghai,

15

Thanks!!

Contacts:[email protected]@intel.com