Upload
brenden-kemp
View
26
Download
3
Embed Size (px)
DESCRIPTION
Diagnosing Performance Overheads in the Xen Virtual Machine Environment. Aravind Menon Willy Zwaenepoel EPFL, Lausanne. Jose Renato Santos Yoshio Turner G. (John) Janakiraman HP Labs, Palo Alto. Virtual Machine Monitors (VMM). Increasing adoption for server applications - PowerPoint PPT Presentation
Citation preview
Diagnosing Performance Diagnosing Performance Overheads in the Xen Virtual Overheads in the Xen Virtual
Machine EnvironmentMachine Environment
Aravind Menon Aravind Menon Willy Willy
ZwaenepoelZwaenepoel
EPFL, EPFL, LausanneLausanne
Jose Renato SantosJose Renato Santos
Yoshio TurnerYoshio Turner
G. (John) JanakiramanG. (John) Janakiraman
HP Labs, Palo AltoHP Labs, Palo Alto
Virtual Machine Monitors Virtual Machine Monitors (VMM)(VMM)
Increasing adoption for server Increasing adoption for server applicationsapplications Server consolidation, co-located hostingServer consolidation, co-located hosting
Virtualization can affect application Virtualization can affect application performance in unexpected waysperformance in unexpected ways
Web server performance in Web server performance in XenXen
25-66% lower peak throughput than Linux 25-66% lower peak throughput than Linux depending on Xen configurationdepending on Xen configuration
Web server throughput in Linux and Xen
0
20
40
60
80
100
120
140
160
180
0 5000 10000 15000 20000 25000 30000
Request Rate (reqs/sec)
Th
rou
gh
pu
t (M
b/s
) Linux
PrivilegedXen VM
UnprivilegedXen VM
Need VM-aware profiling to diagnose Need VM-aware profiling to diagnose causes of performance degradationcauses of performance degradation
ContributionsContributions
Xenoprof – framework for VM-aware Xenoprof – framework for VM-aware profiling in Xenprofiling in Xen
Understanding network virtualization Understanding network virtualization overheads in Xenoverheads in Xen
Debugging performance anomaly Debugging performance anomaly using Xenoprofusing Xenoprof
OutlineOutline
MotivationMotivation XenoprofXenoprof Network virtualization overheads in Network virtualization overheads in
XenXen Debugging using XenoprofDebugging using Xenoprof ConclusionsConclusions
Xenoprof – profiling for VMsXenoprof – profiling for VMs
Profile applications running in VM Profile applications running in VM environmentsenvironments Contribution of different domains (VMs) and Contribution of different domains (VMs) and
the VMM (Xen) routines to execution costthe VMM (Xen) routines to execution cost Profile various hardware eventsProfile various hardware events
Example outputExample output Function name %Instructions ModuleFunction name %Instructions Module
--------------------------------------------------------------------------------------------------------------------------------------------
mmu_update 13 Xen (VMM)mmu_update 13 Xen (VMM)
br_handle_frame 8 driver domain (Dom 0)br_handle_frame 8 driver domain (Dom 0)
tcp_v4_rcv 5 guest domain (Dom 1)tcp_v4_rcv 5 guest domain (Dom 1)
Xenoprof – architecture Xenoprof – architecture (brief)(brief)
Extend existing profilers (OProfile) to use XenoprofExtend existing profilers (OProfile) to use Xenoprof Xenoprof collects samples and coordinates Xenoprof collects samples and coordinates
profilers running in multiple domainsprofilers running in multiple domains
Domain 0
OProfile (extended)
Xenoprof
Domain 1
OProfile (extended)
Domain 2
OProfile (extended)
Domains (VMs)
Xen VMM
H/W performance counters
OutlineOutline
MotivationMotivation XenoprofXenoprof Network virtualization overheads in Network virtualization overheads in
XenXen Debugging using XenoprofDebugging using Xenoprof ConclusionsConclusions
Xen network I/O Xen network I/O architecturearchitecture
Privileged Privileged driver domaindriver domain controls physical NIC controls physical NIC
Each unprivileged Each unprivileged guest domainguest domain uses virtual NIC uses virtual NIC connected to driver domain via Xen I/O Channelconnected to driver domain via Xen I/O Channel Control: I/O descriptor ring (shared memory)Control: I/O descriptor ring (shared memory) Data Transfer: Page remapping (no copying)Data Transfer: Page remapping (no copying)
I/O Driver Domain Guest Domain
I/O Channel
NIC
Bridge
vif1 vif2
Evaluated configurationsEvaluated configurations Linux: no XenLinux: no Xen Xen Driver:Xen Driver:
run application in privileged driver domain run application in privileged driver domain Xen Guest:Xen Guest:
run application in unprivileged guest run application in unprivileged guest domain interfaced to driver domain via I/O domain interfaced to driver domain via I/O channelchannel
I/O Driver Domain Guest Domain
I/O Channel
NIC
Bridge
vif1 vif2
Networking micro-Networking micro-benchmarkbenchmark
One streaming TCP connection per NIC (up to 4)One streaming TCP connection per NIC (up to 4)
Micro-benchmark throughput
2462
3764
1878
3764
849 706
0500
1000150020002500300035004000
Receive Transmit
Th
rou
gh
pu
t (M
b/s
)
Linux
Xen driver
Xen guest
Driver receive throughput 75% of Linux throughputDriver receive throughput 75% of Linux throughput Guest throughput 1/3Guest throughput 1/3rdrd to 1/5 to 1/5thth of Linux throughput of Linux throughput
Receive – Xen Driver Receive – Xen Driver overheadoverhead
Profiling shows slower instruction execution with Profiling shows slower instruction execution with Xen Driver than w/Linux (both use 100% CPU)Xen Driver than w/Linux (both use 100% CPU) Data TLB miss count 13 times higherData TLB miss count 13 times higher Instruction TLB miss count 17 times higherInstruction TLB miss count 17 times higher
Xen: 11% more instructions per byte transferred Xen: 11% more instructions per byte transferred (Xen virtual interrupts, driver hypercall) (Xen virtual interrupts, driver hypercall)
1878
849
2462
0
500
1000
1500
2000
2500
3000
Th
rou
gh
pu
t (M
b/s
)
Linux
Xen driver
Xen guest
Receive – Xen Guest Receive – Xen Guest overheadoverhead
Xen Guest configuration executes two Xen Guest configuration executes two times as many instructions as Xen Driver times as many instructions as Xen Driver configurationconfiguration Driver domain (38%): overhead of bridgingDriver domain (38%): overhead of bridging Xen (27%): overhead of page remappingXen (27%): overhead of page remapping
I/O Driver Domain Guest Domain
I/O Channel
NIC
Bridge
vif1 vif2
2462
1878
849
0
500
1000
1500
2000
2500
3000
Th
rou
gh
pu
t (M
b/s
)
Linux
Xen driver
Xen guest
Transmit – Xen Guest Transmit – Xen Guest overheadoverhead
Xen Guest: executes 6 times as many Xen Guest: executes 6 times as many instructions as Xen driver configurationinstructions as Xen driver configuration Factor of 2 as in Receive caseFactor of 2 as in Receive case Guest instructions increase 2.7 timesGuest instructions increase 2.7 times
Virtual NIC (vif2) in guest does not support Virtual NIC (vif2) in guest does not support TCP offload capabilities of NIC TCP offload capabilities of NIC
3764 3764
706
0
500
1000
1500
2000
2500
3000
3500
4000T
hro
ug
hp
ut
(Mb
/s) Linux
Xen driver
Xen guest
Suggestions for improving Suggestions for improving XenXen
Enable virtual NICs to utilize offload Enable virtual NICs to utilize offload capabilities of physical NICcapabilities of physical NIC
Efficient support for packet demultiplexing Efficient support for packet demultiplexing in driver domain in driver domain
OutlineOutline
MotivationMotivation XenoprofXenoprof Network virtualization overheads in Network virtualization overheads in
XenXen Debugging using XenoprofDebugging using Xenoprof ConclusionsConclusions
Anomalous network behavior in Anomalous network behavior in XenXen
TCP receive throughput in Xen changes with TCP receive throughput in Xen changes with application buffer size (slow Pentium III)application buffer size (slow Pentium III)
Throughput vs application buffer size
0
200
400
600
800
1000
0 50 100 150
Application Bufer size (KB)
Th
rou
gh
pu
t (M
b/s
)
Linux
Xenolinux
Debugging using XenoprofDebugging using Xenoprof 40% kernel execution overhead incurred in 40% kernel execution overhead incurred in
socket buffer de-fragmenting routines socket buffer de-fragmenting routines
0
100
200
300
400
500
600
700
800
900
0 20 40 60 80 100 120 140
De-fragmenting socket De-fragmenting socket buffersbuffers
Linux: insignificant fragmentation with streaming workloadLinux: insignificant fragmentation with streaming workload
Socket receive queue
De-fragment
Socket buffer (4 KB)
Data packet (MTU)
Xenolinux (Linux on Xen)Xenolinux (Linux on Xen) Received packets: 1500 bytes (MTU) out of 4 KB socket Received packets: 1500 bytes (MTU) out of 4 KB socket
bufferbuffer Page-sized socket buffers support remapping over I/O Page-sized socket buffers support remapping over I/O
channelchannel
ConclusionsConclusions
Xenoprof useful for identifying major Xenoprof useful for identifying major overheads in Xenoverheads in Xen
Xenoprof to be included in official Xenoprof to be included in official Xen and OProfile releasesXen and OProfile releases
Where to get it: Where to get it: http://xenoprof.sourceforge.nethttp://xenoprof.sourceforge.net