31
Device Device Virtualization Virtualization Architecture Architecture Jake Oshins Jake Oshins Architect Architect Windows Virtualization Windows Virtualization Microsoft Corporation Microsoft Corporation

Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Embed Size (px)

Citation preview

Page 1: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Device VirtualizationDevice VirtualizationArchitectureArchitecture

Jake OshinsJake OshinsArchitectArchitectWindows VirtualizationWindows VirtualizationMicrosoft CorporationMicrosoft Corporation

Page 2: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

GoalsGoals

Participants will leave with an Participants will leave with an understanding ofunderstanding of

How Microsoft intends to enable efficient How Microsoft intends to enable efficient I/O virtualizationI/O virtualization

How others’ I/O solutions interact with How others’ I/O solutions interact with Microsoft’s virtualization systemsMicrosoft’s virtualization systems

Which I/O virtualization strategies willWhich I/O virtualization strategies willbe available with Windows Server be available with Windows Server virtualization and which must waitvirtualization and which must wait

Page 3: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

AgendaAgenda

General strategies for I/O virtualizationGeneral strategies for I/O virtualization

Technical overview ofTechnical overview ofVirtual Device FrameworkVirtual Device Framework

Technical overview of VMBusTechnical overview of VMBus

Page 4: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Device EmulationDevice Emulation

Virtual machine “Virtual machine “seessees” real hardware devices” real hardware devices

Each access to the “Each access to the “devicedevice” involves an intercept, sent to ” involves an intercept, sent to the parent virtual machinethe parent virtual machine

Performance is sub-optimalPerformance is sub-optimal

Compatibility with existing software can be perfectCompatibility with existing software can be perfect

Microsoft provides emulationsMicrosoft provides emulationsThe hardware that is emulated is from ~1997, providingThe hardware that is emulated is from ~1997, providingin-box compatibility with old OSesin-box compatibility with old OSes

Requires a “Requires a “monitormonitor” partition that contains software for ” partition that contains software for emulating the devicesemulating the devices

Physical devices can be shared amongPhysical devices can be shared amongmultiple guestsmultiple guests

Page 5: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

I/O EnlightenmentI/O Enlightenment

Uses abstract protocols to describe I/OUses abstract protocols to describe I/OUseful protocols already existUseful protocols already exist

SCSI, iSCSISCSI, iSCSIRNDISRNDISRDPRDP

New device stack implementations in theNew device stack implementations in thesecondary guests can be written that usesecondary guests can be written that usethese abstract protocolsthese abstract protocolsProtocol servers exist in a primary guestProtocol servers exist in a primary guest(parent), which is the partition that controls(parent), which is the partition that controlsthe physical devicesthe physical devicesMultiple secondary guests can share the servicesMultiple secondary guests can share the servicesof a single hardware deviceof a single hardware deviceDoesn’t require an emulatorDoesn’t require an emulatorDoesn’t require a monitor partitionDoesn’t require a monitor partition

Page 6: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Device AssignmentDevice Assignment

Guest OSes control their devices directlyGuest OSes control their devices directlyParent OS gives up control of these devicesParent OS gives up control of these devices

Ownership of a device is exclusiveOwnership of a device is exclusive

Performance can match that of aPerformance can match that of anon-virtualized machinenon-virtualized machine

Interdependence of partitions canInterdependence of partitions canbe minimizedbe minimized

Strong isolation of partitions can Strong isolation of partitions can be achievedbe achieved

Page 7: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Windows Virtualization Windows Virtualization Will ProvideWill Provide

Device emulationDevice emulationProvides migration path for Microsoft Virtual Provides migration path for Microsoft Virtual Server usersServer users

~1997 era virtual motherboard~1997 era virtual motherboard

Good for compatibility with old OSesGood for compatibility with old OSes

I/O enlightenmentI/O enlightenmentStorageStorage

NetworkingNetworking

VideoVideo

USBUSB

Page 8: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

AgendaAgenda

General strategies for I/O virtualizationGeneral strategies for I/O virtualization

Overview ofOverview ofVirtual Device FrameworkVirtual Device Framework

Technical overview of VMBusTechnical overview of VMBus

Page 9: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Virtualization I/O DefinitionsVirtualization I/O Definitions

Virtual Device (VDev)Virtual Device (VDev)A software module that provides a point of configuration and control over A software module that provides a point of configuration and control over an I/O path for a partitionan I/O path for a partition

Virtualization Service Provider (VSP)Virtualization Service Provider (VSP)A server component (in a parent or other partition) that handlesA server component (in a parent or other partition) that handlesI/O requestsI/O requests

Can pass I/O requests on to native services like a file systemCan pass I/O requests on to native services like a file system

Can pass I/O requests directly to physical devicesCan pass I/O requests directly to physical devices

Can be in either kernel- or user-modeCan be in either kernel- or user-mode

Virtualization Service Consumer (VSC)Virtualization Service Consumer (VSC)A client component (in a child partition) which serves as the bottom of an A client component (in a child partition) which serves as the bottom of an I/O stack within that partitionI/O stack within that partition

Sends requests to a VSPSends requests to a VSP

VMBusVMBusA system for sending requests and data between virtual machinesA system for sending requests and data between virtual machines

Page 10: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Virtual Devices (VDevs)Virtual Devices (VDevs)

Come in two varietiesCome in two varietiesCore: Device emulatorsCore: Device emulators

Written by MicrosoftWritten by Microsoft

Plug-in: Enlightened I/OPlug-in: Enlightened I/OWritten by Microsoft and industryWritten by Microsoft and industry

Management is through WMIManagement is through WMI

Packaged as COM objectsPackaged as COM objectsRun within the VM Worker ProcessRun within the VM Worker Process

Often work in conjunction with a VSPOften work in conjunction with a VSP

Page 11: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

VDev EnvironmentVDev EnvironmentVirtual Machine Worker Process

Core VDevs Emulated Devices & Hybrid Devices

Plug-InVDevs

CO

M

Configuration

Repositories

Sta

te

Cha

nge

s

CO

M

Virtual Motherboard

TimersVirtual Hardware

Guest Memory

IRQ Generation

Memory and Port-Mapped IO

Core Only

Core VDevs Emulated Devices & Hybrid Devices

Core Vdevs - Emulated Devices

To

VS

P

Plug-InVDevsPlug-InVDevs

To

VS

P

Activation

Save

Current Time

Create Timer

State Machine

Active

Powering Up

Saving

Powering Down

Sta

te

Cha

nge

s

Sta

te

Cha

nge

s

Page 12: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Virtualization Service Virtualization Service Providers (VSPs)Providers (VSPs)

Communicate with a VDev for Communicate with a VDev for configuration and state managementconfiguration and state management

Can exist in user- or kernel-modeCan exist in user- or kernel-modeCOM objectCOM object

ServiceService

DriverDriver

Use VMBus to communicateUse VMBus to communicatewith a VSC in the child partitionwith a VSC in the child partition

Page 13: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Example VSP/VSC DesignExample VSP/VSC DesignParent Child

User Mode

Kernel Mode

Viridian Virtualization Stack Worker Process

Image Parser

VMBUS

VirtualStorageMiniport(VSC)

iSCSIprt

Parition

Volume

File System

VM

SR

Bs

VirtualStorageServer(VSP)

Storport Miniport

StorPort

Hardware

Disk

FastPath filterV

M S

RB

s

User Mode

Kernel Mode

Parition

Volume

File System

Disk

Page 14: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

AgendaAgenda

General strategies for I/O virtualizationGeneral strategies for I/O virtualization

Technical overview ofTechnical overview ofVirtual Device FrameworkVirtual Device Framework

Technical overview of VMBusTechnical overview of VMBus

Page 15: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

VMBus – What Is It?VMBus – What Is It?

A protocol for transferring data through a ring bufferA protocol for transferring data through a ring bufferA means of mapping a ring buffer into multiple partitionsA means of mapping a ring buffer into multiple partitionsA definition for the format of the ring bufferA definition for the format of the ring bufferA means of signaling that a ring buffer has gone non-emptyA means of signaling that a ring buffer has gone non-empty

A protocol for offering/discovering servicesA protocol for offering/discovering servicesA protocol for managing guest physical addressesA protocol for managing guest physical addressesA protocol for enumerating WDM device objectsA protocol for enumerating WDM device objectsthat represent a data channelthat represent a data channelA bus driver which implements all of those protocolsA bus driver which implements all of those protocolsA data transfer library which can be linked intoA data transfer library which can be linked intoa user-mode service or applicationa user-mode service or applicationA data transfer library which can be linked intoA data transfer library which can be linked intoa kernel-mode drivera kernel-mode driver

Page 16: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

VMBus DefinitionsVMBus Definitions

EndpointEndpointA module that reads or writes data through VMBusA module that reads or writes data through VMBus

ChannelChannelTwo endpoints – one server, one clientTwo endpoints – one server, one client

Two ring buffersTwo ring buffers

Transfer PageTransfer PagePre-allocated page of memory that is mappedPre-allocated page of memory that is mappedinto both endpoints’ partitionsinto both endpoints’ partitions

Not part of a ring bufferNot part of a ring buffer

Used as a target for DMA or for other operations that Used as a target for DMA or for other operations that may take a “long” time to completemay take a “long” time to complete

Page 17: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

VMBus DefinitionsVMBus Definitions

Guest Physical Address DescriptorGuest Physical Address DescriptorList (GPADL)List (GPADL)

Memory descriptor list that can beMemory descriptor list that can bepassed to another partitionpassed to another partitionAllows a device to do DMA to or fromAllows a device to do DMA to or froma child partition directlya child partition directly

PipePipeA default channel protocol that allowsA default channel protocol that allowsa client to use ReadFile or WriteFile to send a client to use ReadFile or WriteFile to send data between partitionsdata between partitionsServes as the basis for cross-partition Serves as the basis for cross-partition Remote Procedure CallRemote Procedure Call

Page 18: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

How Is Data Moved How Is Data Moved Between Partitions?Between Partitions?

Commands are placed in ring buffersCommands are placed in ring buffers

Small data is placed in ring buffersSmall data is placed in ring buffers

Larger data is placed in pre-arranged Larger data is placed in pre-arranged pages shared between partitionspages shared between partitions

Described by commands in ring buffersDescribed by commands in ring buffers

Largest data is mapped into another Largest data is mapped into another partition without copyingpartition without copying

Described by GPADLs placed inDescribed by GPADLs placed inring buffersring buffers

Page 19: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Hypervisor InvolvementHypervisor Involvement

When is it necessary?When is it necessary?Channel setupChannel setup

Signaling another partitionSignaling another partitionModeled as a hardware interruptModeled as a hardware interrupt

When is it not necessary?When is it not necessary?When placing packets in a ring bufferWhen placing packets in a ring buffer

When removing packets from a ring bufferWhen removing packets from a ring buffer

When reading or writing Transfer PagesWhen reading or writing Transfer Pages

When translating guest memory mapsWhen translating guest memory maps

Page 20: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Guest Physical Address SpaceGuest Physical Address SpaceGPADLsGPADLs

Allow transactions to refer to guest buffersAllow transactions to refer to guest buffersNo data copying requiredNo data copying required

Built within the Virtualization Stack in theBuilt within the Virtualization Stack in theparent partitionparent partitionAllows I/O to be handled without switchingAllows I/O to be handled without switchinginto and out of the hypervisorinto and out of the hypervisorAllows child partitions’ VSCs to use theirAllows child partitions’ VSCs to use theirown physical addresses in requests to VSPsown physical addresses in requests to VSPsAllows VSPs easy access to translationsAllows VSPs easy access to translations

Particularly if VSP is a driver in kernel-modeParticularly if VSP is a driver in kernel-modeTypical transaction can involve no hypercallsTypical transaction can involve no hypercalls

Page 21: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Request Packet StructureRequest Packet Structure

Parent Partition Child Partition

12

3 ApplicationBuffers

Header GPADL – Describes Application Buffers

Protocol – Device Specific

Page 22: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

What Does Traffic Look Like?What Does Traffic Look Like?

VMBus underlying protocol isVMBus underlying protocol isvery simplevery simple

Packets are sent asynchronouslyPackets are sent asynchronouslyPrimitives exist to allow synchronizationPrimitives exist to allow synchronization

Packets have very little structurePackets have very little structurePacket may reference Transfer PagesPacket may reference Transfer Pages

Packet may reference a GPADLPacket may reference a GPADL

Other protocols must be definedOther protocols must be definedby the users of the channelby the users of the channel

Page 23: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Request Packet FlowRequest Packet Flow

Parent Partition Child Partition

VSCVSP

Page 24: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Request Packet FlowRequest Packet Flow

Parent Partition Child Partition

VSCVSP

InterruptInterruptthroughthrough

HypervisorHypervisor

Page 25: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Request Packet FlowRequest Packet Flow

Parent Partition Child Partition

VSCVSP

Page 26: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Data FlowData Flow

Parent Partition Child Partition

VSCVSP

Page 27: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Interrupt ManagementInterrupt Management

Can be sent between partitions to signal VSPCan be sent between partitions to signal VSPor VSC code to start runningor VSC code to start running

Avoids software pollingAvoids software polling

Cost of an interrupt is a hypercall and maybeCost of an interrupt is a hypercall and maybea partition context switcha partition context switchOnly necessary when VSP/VSC wouldn’t Only necessary when VSP/VSC wouldn’t already be runningalready be running

When ring buffer was previously emptyWhen ring buffer was previously emptyWhen ring buffer was previously fullWhen ring buffer was previously full

Multiple channels’ interrupts can be coalescedMultiple channels’ interrupts can be coalescedVMBus can track latency requirementsVMBus can track latency requirements

Allows requests to be batchedAllows requests to be batched

Page 28: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Bus DriverBus Driver

VMBus acts as a bus driverVMBus acts as a bus driver

It can form the bottom of a device stackIt can form the bottom of a device stack

VSCs can be instantiated on top of VMBusVSCs can be instantiated on top of VMBus

(Names of components (Names of components not finalized)not finalized)

Page 29: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

Call To ActionCall To Action

Please attend the following session on Please attend the following session on Virtual Networking and StorageVirtual Networking and Storage

Participate in future Windows Server Participate in future Windows Server virtualization virtualization Beta programsBeta programs

Page 30: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 31: Device Virtualization Architecture Jake Oshins Architect Windows Virtualization Microsoft Corporation