20
Diagnosing Performance Diagnosing Performance Overheads in the Xen Overheads in the Xen Virtual Machine Virtual Machine Environment Environment Aravind Menon Aravind Menon Willy Willy Zwaenepoel Zwaenepoel EPFL, EPFL, Lausanne Lausanne Jose Renato Santos Jose Renato Santos Yoshio Turner Yoshio Turner G. (John) Janakiraman G. (John) Janakiraman HP Labs, Palo Alto HP Labs, Palo Alto

Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Embed Size (px)

DESCRIPTION

Diagnosing Performance Overheads in the Xen Virtual Machine Environment. Aravind Menon Willy Zwaenepoel EPFL, Lausanne. Jose Renato Santos Yoshio Turner G. (John) Janakiraman HP Labs, Palo Alto. Virtual Machine Monitors (VMM). Increasing adoption for server applications - PowerPoint PPT Presentation

Citation preview

Page 1: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Diagnosing Performance Diagnosing Performance Overheads in the Xen Virtual Overheads in the Xen Virtual

Machine EnvironmentMachine Environment

Aravind Menon Aravind Menon Willy Willy

ZwaenepoelZwaenepoel

EPFL, EPFL, LausanneLausanne

Jose Renato SantosJose Renato Santos

Yoshio TurnerYoshio Turner

G. (John) JanakiramanG. (John) Janakiraman

HP Labs, Palo AltoHP Labs, Palo Alto

Page 2: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Virtual Machine Monitors Virtual Machine Monitors (VMM)(VMM)

Increasing adoption for server Increasing adoption for server applicationsapplications Server consolidation, co-located hostingServer consolidation, co-located hosting

Virtualization can affect application Virtualization can affect application performance in unexpected waysperformance in unexpected ways

Page 3: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Web server performance in Web server performance in XenXen

25-66% lower peak throughput than Linux 25-66% lower peak throughput than Linux depending on Xen configurationdepending on Xen configuration

Web server throughput in Linux and Xen

0

20

40

60

80

100

120

140

160

180

0 5000 10000 15000 20000 25000 30000

Request Rate (reqs/sec)

Th

rou

gh

pu

t (M

b/s

) Linux

PrivilegedXen VM

UnprivilegedXen VM

Need VM-aware profiling to diagnose Need VM-aware profiling to diagnose causes of performance degradationcauses of performance degradation

Page 4: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

ContributionsContributions

Xenoprof – framework for VM-aware Xenoprof – framework for VM-aware profiling in Xenprofiling in Xen

Understanding network virtualization Understanding network virtualization overheads in Xenoverheads in Xen

Debugging performance anomaly Debugging performance anomaly using Xenoprofusing Xenoprof

Page 5: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

OutlineOutline

MotivationMotivation XenoprofXenoprof Network virtualization overheads in Network virtualization overheads in

XenXen Debugging using XenoprofDebugging using Xenoprof ConclusionsConclusions

Page 6: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Xenoprof – profiling for VMsXenoprof – profiling for VMs

Profile applications running in VM Profile applications running in VM environmentsenvironments Contribution of different domains (VMs) and Contribution of different domains (VMs) and

the VMM (Xen) routines to execution costthe VMM (Xen) routines to execution cost Profile various hardware eventsProfile various hardware events

Example outputExample output Function name %Instructions ModuleFunction name %Instructions Module

--------------------------------------------------------------------------------------------------------------------------------------------

mmu_update 13 Xen (VMM)mmu_update 13 Xen (VMM)

br_handle_frame 8 driver domain (Dom 0)br_handle_frame 8 driver domain (Dom 0)

tcp_v4_rcv 5 guest domain (Dom 1)tcp_v4_rcv 5 guest domain (Dom 1)

Page 7: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Xenoprof – architecture Xenoprof – architecture (brief)(brief)

Extend existing profilers (OProfile) to use XenoprofExtend existing profilers (OProfile) to use Xenoprof Xenoprof collects samples and coordinates Xenoprof collects samples and coordinates

profilers running in multiple domainsprofilers running in multiple domains

Domain 0

OProfile (extended)

Xenoprof

Domain 1

OProfile (extended)

Domain 2

OProfile (extended)

Domains (VMs)

Xen VMM

H/W performance counters

Page 8: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

OutlineOutline

MotivationMotivation XenoprofXenoprof Network virtualization overheads in Network virtualization overheads in

XenXen Debugging using XenoprofDebugging using Xenoprof ConclusionsConclusions

Page 9: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Xen network I/O Xen network I/O architecturearchitecture

Privileged Privileged driver domaindriver domain controls physical NIC controls physical NIC

Each unprivileged Each unprivileged guest domainguest domain uses virtual NIC uses virtual NIC connected to driver domain via Xen I/O Channelconnected to driver domain via Xen I/O Channel Control: I/O descriptor ring (shared memory)Control: I/O descriptor ring (shared memory) Data Transfer: Page remapping (no copying)Data Transfer: Page remapping (no copying)

I/O Driver Domain Guest Domain

I/O Channel

NIC

Bridge

vif1 vif2

Page 10: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Evaluated configurationsEvaluated configurations Linux: no XenLinux: no Xen Xen Driver:Xen Driver:

run application in privileged driver domain run application in privileged driver domain Xen Guest:Xen Guest:

run application in unprivileged guest run application in unprivileged guest domain interfaced to driver domain via I/O domain interfaced to driver domain via I/O channelchannel

I/O Driver Domain Guest Domain

I/O Channel

NIC

Bridge

vif1 vif2

Page 11: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Networking micro-Networking micro-benchmarkbenchmark

One streaming TCP connection per NIC (up to 4)One streaming TCP connection per NIC (up to 4)

Micro-benchmark throughput

2462

3764

1878

3764

849 706

0500

1000150020002500300035004000

Receive Transmit

Th

rou

gh

pu

t (M

b/s

)

Linux

Xen driver

Xen guest

Driver receive throughput 75% of Linux throughputDriver receive throughput 75% of Linux throughput Guest throughput 1/3Guest throughput 1/3rdrd to 1/5 to 1/5thth of Linux throughput of Linux throughput

Page 12: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Receive – Xen Driver Receive – Xen Driver overheadoverhead

Profiling shows slower instruction execution with Profiling shows slower instruction execution with Xen Driver than w/Linux (both use 100% CPU)Xen Driver than w/Linux (both use 100% CPU) Data TLB miss count 13 times higherData TLB miss count 13 times higher Instruction TLB miss count 17 times higherInstruction TLB miss count 17 times higher

Xen: 11% more instructions per byte transferred Xen: 11% more instructions per byte transferred (Xen virtual interrupts, driver hypercall) (Xen virtual interrupts, driver hypercall)

1878

849

2462

0

500

1000

1500

2000

2500

3000

Th

rou

gh

pu

t (M

b/s

)

Linux

Xen driver

Xen guest

Page 13: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Receive – Xen Guest Receive – Xen Guest overheadoverhead

Xen Guest configuration executes two Xen Guest configuration executes two times as many instructions as Xen Driver times as many instructions as Xen Driver configurationconfiguration Driver domain (38%): overhead of bridgingDriver domain (38%): overhead of bridging Xen (27%): overhead of page remappingXen (27%): overhead of page remapping

I/O Driver Domain Guest Domain

I/O Channel

NIC

Bridge

vif1 vif2

2462

1878

849

0

500

1000

1500

2000

2500

3000

Th

rou

gh

pu

t (M

b/s

)

Linux

Xen driver

Xen guest

Page 14: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Transmit – Xen Guest Transmit – Xen Guest overheadoverhead

Xen Guest: executes 6 times as many Xen Guest: executes 6 times as many instructions as Xen driver configurationinstructions as Xen driver configuration Factor of 2 as in Receive caseFactor of 2 as in Receive case Guest instructions increase 2.7 timesGuest instructions increase 2.7 times

Virtual NIC (vif2) in guest does not support Virtual NIC (vif2) in guest does not support TCP offload capabilities of NIC TCP offload capabilities of NIC

3764 3764

706

0

500

1000

1500

2000

2500

3000

3500

4000T

hro

ug

hp

ut

(Mb

/s) Linux

Xen driver

Xen guest

Page 15: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Suggestions for improving Suggestions for improving XenXen

Enable virtual NICs to utilize offload Enable virtual NICs to utilize offload capabilities of physical NICcapabilities of physical NIC

Efficient support for packet demultiplexing Efficient support for packet demultiplexing in driver domain in driver domain

Page 16: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

OutlineOutline

MotivationMotivation XenoprofXenoprof Network virtualization overheads in Network virtualization overheads in

XenXen Debugging using XenoprofDebugging using Xenoprof ConclusionsConclusions

Page 17: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Anomalous network behavior in Anomalous network behavior in XenXen

TCP receive throughput in Xen changes with TCP receive throughput in Xen changes with application buffer size (slow Pentium III)application buffer size (slow Pentium III)

Throughput vs application buffer size

0

200

400

600

800

1000

0 50 100 150

Application Bufer size (KB)

Th

rou

gh

pu

t (M

b/s

)

Linux

Xenolinux

Page 18: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

Debugging using XenoprofDebugging using Xenoprof 40% kernel execution overhead incurred in 40% kernel execution overhead incurred in

socket buffer de-fragmenting routines socket buffer de-fragmenting routines

0

100

200

300

400

500

600

700

800

900

0 20 40 60 80 100 120 140

Page 19: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

De-fragmenting socket De-fragmenting socket buffersbuffers

Linux: insignificant fragmentation with streaming workloadLinux: insignificant fragmentation with streaming workload

Socket receive queue

De-fragment

Socket buffer (4 KB)

Data packet (MTU)

Xenolinux (Linux on Xen)Xenolinux (Linux on Xen) Received packets: 1500 bytes (MTU) out of 4 KB socket Received packets: 1500 bytes (MTU) out of 4 KB socket

bufferbuffer Page-sized socket buffers support remapping over I/O Page-sized socket buffers support remapping over I/O

channelchannel

Page 20: Diagnosing Performance Overheads in the Xen Virtual Machine Environment

ConclusionsConclusions

Xenoprof useful for identifying major Xenoprof useful for identifying major overheads in Xenoverheads in Xen

Xenoprof to be included in official Xenoprof to be included in official Xen and OProfile releasesXen and OProfile releases

Where to get it: Where to get it: http://xenoprof.sourceforge.nethttp://xenoprof.sourceforge.net