100

Wagner w 420 Kvm Performance Improvements and Optimizations

  • Upload
    rdvn

  • View
    205

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Wagner w 420 Kvm Performance Improvements and Optimizations
Page 2: Wagner w 420 Kvm Performance Improvements and Optimizations

Mark Wagner Principal SW Engineer, Red HatMay 4, 2011

KVM PERFORMANCE

IMPROVEMENTS AND

OPTIMIZATIONS

Page 3: Wagner w 420 Kvm Performance Improvements and Optimizations

Overview

● Discuss a range of topics about KVM performance● Will show how some features impact SPECvirt results● Also show against real world applications

● RHEL5 and RHEL6

● Use libvirt where possible● Note that not all features in all releases

● We will not cover the RHEV products explicitly● Some stuff will apply but your mileage will vary...

Page 4: Wagner w 420 Kvm Performance Improvements and Optimizations

Before we dive in...

RHEL6-default0

50

100

150

200

250

300

350

400

450

Guest NFS Write Performance - are we sure ?

Is this really a 10Gbit line ?

Th

rou

gh

pu

t (

MB

yte

s /

seco

nd

)

By default the rtl8139

device is chosen

Arrow showsimprovement

Page 5: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 6: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt - Basics● Virtual Machines are added in sets of six

● Called a Tile● Each guest has a specific function

● Web (http) ● Infrastructure (NFS for Web)● Application (Java Enterprise) ● DB (for App)● Mail (imap)● Idle

Page 7: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt - Basics● Three SPEC workloads drive one Tile

● SPECweb● SPECjApp● SPECmail

● Run as many Tiles as possible until any of the workloads fail any of the Quality of Service requirements

● Tune, lather, rinse, repeat

Page 8: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt - Basics● Each workload is throttled

● there are think times between requests● SPECjApp workload has peaks/valleys to greatly vary

resource usage in App & DB guests

● SPECvirt Home Page

● http://www.spec.org/virt_sc2010/

Page 9: Wagner w 420 Kvm Performance Improvements and Optimizations

Host

Tile 1 MailDB

SPECvirt Diagram

Tile 2

Web Infra Idle

Tile n

A single QoS error invalidates the entire test

Client

Client

Client

Client

Client

Controller

Each client runs a modifiedversion of SPECweb,

SPECjApp, and SPECmail

App

Tile 4

Tile 3

Page 10: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt_sc2010 Published Results – March 2011

RHEL 5.5 (KVM) / IBMx3650 M3 / 12 cores

Vmware ESX 4.1 / HP D380 G7 / 12 cores

RHEL 6.0 (KVM) / IBM HS22V / 12 cores

RHEL 5.5 (KVM) / IBMx3690 X5 / 16 cores

RHEL 6 (KVM) / IBMx3690 X5 / 16 cores

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0

0.2

0.4

0.6

0.8

1

1.2

1.4

11691221

1367 1369

1763

SPECvirt_sc2010 2-Socket Results(x86_64 Servers) 3/2011

SPECvirt_sc2010 ScoreSPECvirt Tiles/Core

System

SP

EC

virt

_sc

20

10

Sco

re

Tile

s /

Co

re

RHEL5vmware

RHEL6

Page 11: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt_sc2010 Published Results – March 2011

Vmware ESX 4.1 / Bull SAS /32core

Vmware ESX 4.1 / IBMx3850X5 / 32 core

Vmware ESX 4.1 / HP DL580 G7 / 40 core

RHEL 6 (KVM) / IBMx3850 X5 / 64 core

0

1000

2000

3000

4000

5000

6000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2721 2742

3723

5466

SPECvirt_sc2010 2-4 Socket Results

(x86_64 Servers) 3/2011

SPECvirt_sc2010 ScoreSPECvirt Tiles/Core

System

SP

EC

vir

t_sc2

01

0 S

core

Til

es

/ C

ore

Page 12: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt_sc2010 Published Results – May 2011

RHEL 5.5 (KVM) / IBMx3650 M3 / 12 cores

Vmware ESX 4.1 / HP D380 G7 / 12 cores

RHEL 6.0 (KVM) / IBM HS22V/ 12 cores

RHEL 5.5 (KVM) / IBMx3690 X5 / 16 cores

RHEL 6 (KVM) / IBMx3690 X5 / 16 cores

Vmware ESX 4.1 / HP BL620c G7 / 20 cores

RHEL6.1 (KVM) / HP BL620c G7 / 20 cores

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1169 1221

1367 1369

1763 1811 1820

SPECvirt_sc2010 2-Socket Results(x86_64 Servers) 5/2011

SPECvirt_sc2010 ScoreSPECvirt Tiles/Core

System

SP

EC

vir

t_s

c2

01

0 S

co

re

Til

es

/ C

ore

RHEL6

vmwareRHEL5

Page 13: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt_sc2010 Published Results – May 2011

Vmware ESX 4.1 / Bull SAS /32 cores

Vmware ESX 4.1 / IBMx3850X5 / 32 cores

Vmware ESX 4.1 / HP DL580 G7 / 40 cores

RHEL 6 (KVM) / IBMx3850 X5 / 64 cores

RHEL 6 (KVM) / IBMx3850 X5 / 80 cores

0

1000

2000

3000

4000

5000

6000

7000

8000

0

0.2

0.4

0.6

0.8

1

1.2

1.4

2721 2742

3723

5466

7067

SPECvirt_sc2010 2-4 Socket Results(x86_64 Servers) 4/2011

SPECvirt_sc2010 ScoreSPECvirt Tiles/Core

System

SP

EC

vir

t_sc2

01

0 S

core

Til

es

/ c

ore

Page 14: Wagner w 420 Kvm Performance Improvements and Optimizations

Unofficial SPECvirt Tiles vs Tuning - RHEL5.5

Baseline0

2

4

6

8

10

12

14

8

Impact of Tuning KVM for SPECvirt

Not official SPECvirt Results

SPECvirt Tiles

Nu

mb

er

of

Tile

s

Based on presentation by Andrew Theurer at KVM Forum – August 2010

Page 15: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Some block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 16: Wagner w 420 Kvm Performance Improvements and Optimizations

Quick Overview – KVM Architecture

● Guests run as a process in userspace on the host

● A virtual CPU is implemented using a Linux thread● The Linux scheduler is responsible for scheduling a

virtual CPU, as it is a normal thread● Guests inherit features from the kernel

● NUMA● Huge Pages● Support for new hardware

Page 17: Wagner w 420 Kvm Performance Improvements and Optimizations

Quick Overview – KVM Architecture

● Disk and Network I/O through host (most of the time)● I/O settings in host can make a big difference in guest

I/O performance● Need to understand host buffer caching

● Proper settings to achieve true direct I/O from the guest● Deadline scheduler (on host) typically gives best

performance ● Network typically goes through a software bridge

Page 18: Wagner w 420 Kvm Performance Improvements and Optimizations

Simplified view of KVM

Linux

Driver Driver Driver

Hardware

UserVM

UserVM

UserVM

KVM

OrdinaryLinux

Process

OrdinaryLinux

Process

OrdinaryLinux

Process

Modules

Page 19: Wagner w 420 Kvm Performance Improvements and Optimizations

Performance Improvements in RHEL6

● Performance enhancements in every component

Component Feature

CPU/Kernel NUMA – Ticketed spinlocks; Completely fair scheduler; Extensive use of Read Copy Update (RCU)Scales up to 64 vcpus per guest

Memory Large memory optimizations: Transparent Huge Pages is ideal for hardware based virtualization

Networking Vhost-net – a kernel based virtio w/ better throughput and latency. SRIOV for ~native performance

Block AIO, MSI, scatter gather.

Page 20: Wagner w 420 Kvm Performance Improvements and Optimizations

Performance Enhancements

● Vhost-net - ● new host kernel networking backend providing superior

throughput and latency over the prior userspace implementation

● FPU performance optimization ● Avoid the need for host to trap guest FPU cr0 access

Page 21: Wagner w 420 Kvm Performance Improvements and Optimizations

Performance Enhancements

● Disk I/O latency & throughput improvements, ● using ioeventfd for faster notifications

● Qcow2 virt image format caching of metadata improves performance

● batches writes upon I/O barrier, rather than fsync every time additional block storage needed (thin provisioning growth)

Page 22: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Some block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 23: Wagner w 420 Kvm Performance Improvements and Optimizations

Remember this ?

RHEL6-default0

50

100

150

200

250

300

350

400

450

Guest NFS Write Performance

Th

rou

gh

pu

t (

MB

yte

s /

seco

nd

)

Impact of not specifying OS at guest creation

Page 24: Wagner w 420 Kvm Performance Improvements and Optimizations

Be Specific !

● virt-manager will:● Make sure the guest will

function● Optimize as it can

● The more info you provide the more tailoring will happen Specify the

OS details

Page 25: Wagner w 420 Kvm Performance Improvements and Optimizations

Specify OS + flavor

● Specifying Linux / Red Hat will get you:

● The virtio driver● Unless you are at

RHEL6.1● vhost_net drivers

Page 26: Wagner w 420 Kvm Performance Improvements and Optimizations

I Like This Much Better

RHEL6-default RHEL6-vhost RHEL5-virtio0

50

100

150

200

250

300

350

400

450

Guest NFS Write Performance

Impact of specifying OS Type at Creation

Th

rou

gh

pu

t (

MB

yte

s /

seco

nd

)

12.5 x

Page 27: Wagner w 420 Kvm Performance Improvements and Optimizations

Low Hanging Fruit

● Remove unused devices● Do you need sound in

your web server?● Remove / disable

unnecessary services ?● Both host and guest● Bluetooth in a guest ?

Page 28: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Some block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 29: Wagner w 420 Kvm Performance Improvements and Optimizations

Memory Enhancements for Virtualization

● Extended Page Table (EPT) age bits

● Kernel Same-page Merging (KSM)

● Transparent Huge Pages

Page 30: Wagner w 420 Kvm Performance Improvements and Optimizations

Memory Enhancements for Virtualization

● Extended Page Table (EPT) age bits● Allow host to make smarter swap choice when under

pressure.● Kernel Same-page Merging (KSM)

● Consolidate duplicate pages.● Particularly efficient for Windows guests.

● Transparent Huge Pages● Efficiently manage large memory allocations as one unit

Page 31: Wagner w 420 Kvm Performance Improvements and Optimizations

Memory sharing

● KSM● Scan into (transparent) huge pages

● Help delay/avoid swapping● At cost of slower memory access● That's the major issue – to make sure THP won't reduce the

memory over commit capacity of the host

● Classical speed vs. space trade off● Throughput vs. density

● Add KSM on|off per VM

Page 32: Wagner w 420 Kvm Performance Improvements and Optimizations

Memory Tuning – Huge Pages

● 2M pages vs 4K standard Linux page● Virtual to physical page map is 512 times smaller● TLB can map more physical page resulting fewer misses

● Traditional Huge Pages always pinned

● Transparent Huge Pages in RHEL6

● Most databases support Huge Pages

● How to configure Huge Pages (16G)● echo 8192 > /proc/sys/vm/nr_hugepages● vi /etc/sysctl.conf (vm.nr_hugepages=8192)

Page 33: Wagner w 420 Kvm Performance Improvements and Optimizations

Memory Tuning – Huge Pages

● Benefits not only Host but guests● Try them in a guest too !

Page 34: Wagner w 420 Kvm Performance Improvements and Optimizations

Try Huge Pages on Host and Guest

No huge pagesHost using huge pages

Guest & host using huge pages

K

50K

100K

150K

200K

250K

300K

Impact of Huge Pages on SPECjbb

RHEL5.5 - 8 threads

SP

EC

jbb2

005

bo

ps

46%24%

Page 35: Wagner w 420 Kvm Performance Improvements and Optimizations

10U 20U 40Uk

100k

200k

300k

400k

500k

600k

KVM guest runs with and without Huge Pages

RHEL5.5 – KVM - OLTP Workload

1 Guest1 Guest – Huge pages4 Guests4 Guests – Huge pages

Number of users (x 100)

Ag

gre

ga

te T

ran

sact

ion

s /

Min

ute

Impact of Huge Pages

Page 36: Wagner w 420 Kvm Performance Improvements and Optimizations

Transparent Huge Pages

No-THP THPK

50K

100K

150K

200K

250K

300K

350K

400K

450K

500K

RHEL6/6.1 SPECjbb

24-cpu, 24 vcpu Westmere EP, 24GB

r6-guestr6-metal

Tra

nsa

ctio

ns

Pe

r M

inu

te

Page 37: Wagner w 420 Kvm Performance Improvements and Optimizations

Unofficial SPECvirt Tiles vs Tuning - RHEL5.5

Baseline Huge Pages0

2

4

6

8

10

12

14

88.8

Impact of Tuning KVM for SPECvirt

Not official SPECvirt Results

SPECvirt Tiles

Nu

mb

er

of

Tile

s

Based on presentation by Andrew Theurer at KVM Forum – August 2010

10%

Page 38: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Some block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 39: Wagner w 420 Kvm Performance Improvements and Optimizations

Virtualization Tuning - Network

● General Tips

● Virtio

● vhost_net

● PCI passthrough

● SR-IOV (Single root I/O Virtualization)

Page 40: Wagner w 420 Kvm Performance Improvements and Optimizations

Network Tuning Tips

● Separate networks for different functions● Use arp_filter to prevent ARP Flux

● echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter● Use /etc/sysctl.conf for permanent

● Don't need HW to bridge intra-box communications● VM traffic never hits the HW on same box● Can really kick up MTU as needed

● Packet size● Need to make sure it is set across all components

Page 41: Wagner w 420 Kvm Performance Improvements and Optimizations

Virtualization Tuning - Network● Virtio

● VirtIO drivers for network

● vhost_net (low latency – close to line speed)● Bypass the qemu layer

● PCI passthrough● Bypass the host and pass the PCI device to the guest● Can be passed only to one guest

● SR-IOV (Single root I/O Virtualization)● Can be shared among multiple guests● Limited hardware support● Pass through to the guest

Page 42: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture - VirtIO

● Virtual Machine sees paravirtualized network device – VirtIO

● VirtIO drivers included in Linux Kernel

● VirtIO drivers available for Windows

● Network stack implemented in userspace

Page 43: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture Virtio

Page 44: Wagner w 420 Kvm Performance Improvements and Optimizations

1 19 51131

3871027

30758195

2457965539

0

50

100

150

200

250

300

350

400

Network Latency virtio

Guest Receive (Lower is better)

VirtioHost

Message Size (Bytes)

Late

ncy

(use

cs)

Latency comparison – RHEL 6

4X gap in latency

Page 45: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture – vhost_net

● New in RHEL6.1

● Moves QEMU network stack from userspace to kernel

● Improved performance

● Lower Latency

● Reduced context switching

● One less copy

Page 46: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture – vhost_netvhost_net

Page 47: Wagner w 420 Kvm Performance Improvements and Optimizations

1 19 51131

3871027

30758195

2457965539

0

50

100

150

200

250

300

350

400

Network Latency - vhost_net

Guest Receive (Lower is better)

VirtioVhost_netHost

Message Size (Bytes)

Late

ncy

(use

cs)

Latency comparison – RHEL 6

Latency much closerto bare metal

Page 48: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture – VirtIO vs vhost_netVirtio vhost_net

Page 49: Wagner w 420 Kvm Performance Improvements and Optimizations

Host CPU Consumption virtio vs vhost_net

32-v

host

32-v

io

128-

vhost

128-

vio

512-v

host

512-

vio

2048

-vhost

2048-

vio

8192-

vhost

8192-

vio

32768

-vhost

32768

-vio

0

5

10

15

20

25

30

35

40

45

Host CPU Consumption, virtio vs Vhost

8 Guests TCP Receive

%usr%soft %guest %sys

Message Size (Bytes)

% T

ota

l Ho

st C

PU

(L

ow

er

is B

ett

er)

Major difference is usr time

Page 50: Wagner w 420 Kvm Performance Improvements and Optimizations

vhost_net Efficiency

32 64128

256512

10242048

40928192

1638432768

655070

50

100

150

200

250

300

350

400

8 Guest Scale Out RX Vhost vs Virtio - % Host CPU

Mbit per % CPU netperf TCP_STREAM

VhostVirtio

Message Size (Bytes)

Mb

it /

% C

PU

(b

igge

r is

be

tte

r)

Page 51: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture – PCI Device Assignment

● Physical NIC is passed directly to guest

● Guest sees real physical device● Needs physical device driver

● Requires hardware support

Intel VT-D or AMD IOMMU● Lose hardware independence

● 1:1 mapping of NIC to Guest

● BTW - This also works on some I/O controllers

Page 52: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture – Device Assignment

Device Assignment

Page 53: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Network Architecture – SR-IOV

● Single Root I/O Virtualization

New class of PCI devices that present multiple virtual devices that appear as regular PCI devices

● Guest sees real physical device● Needs physical device driver

● Requires hardware support

● Low overhead, high throughput

● No live migration

● Lose hardware independence

Page 54: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Architecture – SR-IOV

SR-IOV

Page 55: Wagner w 420 Kvm Performance Improvements and Optimizations

KVM Architecture – Device Assignment vs SR/IOV

Device Assignment SR-IOV

Page 56: Wagner w 420 Kvm Performance Improvements and Optimizations

Latency comparison – RHEL 6 based methods

1 19 51131

3871027

30758195

2457965539

0

50

100

150

200

250

300

350

400

Network Latency by guest interface method

Guest Receive (Lower is better)

VirtioVhost_netSR-IOVHost

Message Size (Bytes)

Late

ncy

(use

cs)

SR-IOV latency closeto bare metal

SR-IOV latency close to bare metal

Page 57: Wagner w 420 Kvm Performance Improvements and Optimizations

RHEL6 KVM w/ SR-IOV Intel Niantic 10GbPostgres DB

Th

rou

gh

pu

t in

Ord

er/m

in (

OP

M)

69,984

86,46992,680

010,00020,00030,00040,00050,00060,00070,00080,00090,000

100,000

1 Red Hat KVM bridged guest

1 Red Hat KVMSR-IOV guest

1 database instance (bare metal)

Total OPM

DVD Store Version 2 results

93% Bare Metal

Page 58: Wagner w 420 Kvm Performance Improvements and Optimizations

Unofficial SPECvirt Tiles vs Tuning - RHEL5.5

Baseline Huge Pages SR-IOV0

2

4

6

8

10

12

14

88.8

10.6

Impact of Tuning KVM for SPECvirt

Not official SPECvirt Results

SPECvirt Tiles

Nu

mb

er

of

Tile

s

Based on presentation by Andrew Theurer at KVM Forum – August 2010

10%32%

Page 59: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 60: Wagner w 420 Kvm Performance Improvements and Optimizations

Block

Recent improvements Tuning hardware Choosing an elevator Choosing the caching model Tuned / ktune Device assignment

Page 61: Wagner w 420 Kvm Performance Improvements and Optimizations

Block Improvements

● Ioeventd performance enhancement

● 4K sector size (RHEL6)

● Windows drivers● MSI support

● AIO implementation

Page 62: Wagner w 420 Kvm Performance Improvements and Optimizations

I/O Tuning - Hardware

● Know your Storage● SAS or SATA?● Fibre Channel, Ethernet or SSD?● Bandwidth limits

● Multiple HBAs● Device-mapper-multipath● Provides multipathing capabilities and LUN persistence

● How to test● Low level I/O tools – dd, iozone, dt, etc

Page 63: Wagner w 420 Kvm Performance Improvements and Optimizations

I/O Tuning – Understanding I/O Elevators

● I/O Elevators of Interest

● Deadline● CFQ● Noop

● Use deadline on the host

● Your mileage may vary

● Experiment !

Page 64: Wagner w 420 Kvm Performance Improvements and Optimizations

I/O Tuning – Understanding I/O Elevators

● Deadline● Two queues per device, one for read and one for writes

● IOs dispatched based on time spent in queue

● CFQ● Per process queue

● Each process queue gets fixed time slice (based on process priority)

● Noop● FIFO

● Simple I/O Merging

● Lowest CPU Cost

Page 65: Wagner w 420 Kvm Performance Improvements and Optimizations

I/O Tuning – How to configure I/O Elevators

● Boot-time ● Grub command line – elevator=deadline/cfq/noop

● Dynamically – per device● echo “deadline” > /sys/class/block/sda/queue/scheduler

Page 66: Wagner w 420 Kvm Performance Improvements and Optimizations

Virtualization Tuning – I/O elevators - OLTP

1Guest 2 Guests 4 GuestsK

50K

100K

150K

200K

250K

300K

Performance Impact of I/O Elevators on OLTP Workload

Host running Deadline Scheduler

NoopCFQDeadline

Tra

nsa

ctio

ns

pe

r M

inu

te

Page 67: Wagner w 420 Kvm Performance Improvements and Optimizations

Virtualization Tuning – I/O Cache● Three Types

● Cache=none● Cache=writethrough● Cache=writeback - Not supported

Page 68: Wagner w 420 Kvm Performance Improvements and Optimizations

Virtualization Tuning - Caching● Cache=none

● I/O from the guest in not cached

● Cache=writethrough● I/O from the guest is cached and written through on the

host● Potential scaling problems with this option with multiple

guests (host cpu used to maintain cache)● Cache=writeback - Not supported

Page 69: Wagner w 420 Kvm Performance Improvements and Optimizations

Virtualization Tuning - Caching

● Configure I/O-Cache per disk in qemu command line or libvirt

● Virt-manager – drop-down option under “Advanced Options”

● Libvirt XML file - driver name='qemu' type='raw' cache='none' io='native'

Page 70: Wagner w 420 Kvm Performance Improvements and Optimizations

Effect of I/O Cache settings on Guest performance

1Guest 4GuestsK

100K

200K

300K

400K

500K

600K

700K

800K

900K

OLTP like workload

FusionIO storage

Cache=WTCache=none

Tra

nsa

ctio

n P

er

Min

ute

Page 71: Wagner w 420 Kvm Performance Improvements and Optimizations

I/O Tuning - Filesystems

● RHEL6 introduced barriers● Needed for data integrity● On by default● Can disable on Enterprise class storage

● Configure read ahead● Database ( parameters to configure read ahead)● Block devices ( getra , setra )

● Asynchronous I/O● Eliminate Synchronous I/O stall● Critical for I/O intensive applications

Page 72: Wagner w 420 Kvm Performance Improvements and Optimizations

AIO – Native vs Threaded (default)

10U 20UK

100K

200K

300K

400K

500K

600K

700K

800K

900K

1000K

Impact of AIO selection on OLTP Workload

"cache=none" setting used - Threaded is default

AIO ThreadedAIO Native

Number of Users (x 100)

Tra

nsa

ctio

ns

Pe

r M

inu

te

Configurable per device (only by xml configuration file)Libvirt xml file - driver name='qemu' type='raw' cache='none' io='native'

Page 73: Wagner w 420 Kvm Performance Improvements and Optimizations

RHEL6 “tuned-adm” profiles

# tuned-adm list

Available profiles:

- default

- latency-performance

- throughput-performance

- enterprise-storage

Example# tuned-adm profile enterprise-storage

Recommend “enterprise-storage” w/ KVM

Page 74: Wagner w 420 Kvm Performance Improvements and Optimizations

RHEL6 “tuned-adm” profiles -default

● default ● CFQ elevator (cgroup)● I/O barriers on● ondemand power savings● upstream VM● 4 msec quantum

● Example# tuned-adm profile default

Page 75: Wagner w 420 Kvm Performance Improvements and Optimizations

RHEL6 “tuned-adm” profiles - latency-performance

● latency-performance

● elevator=deadline

● power=performance

● Example# tuned-adm profile latency-performance

Page 76: Wagner w 420 Kvm Performance Improvements and Optimizations

RHEL6 “tuned-adm” profiles - throughput-performance

● throughput-performance

● latency + 10 msec quantum

● readahead 4x

● VM dirty_ratio=40

● Example# tuned-adm profile throughput-performance

Page 77: Wagner w 420 Kvm Performance Improvements and Optimizations

Remember Network Device Assignment ?

● Device Assignment● It works for Block too !● Device Specific● Similar Benefits● And drawbacks...

Page 78: Wagner w 420 Kvm Performance Improvements and Optimizations

78

KVM VirtIO KVM/PCI-PassThrough Bare-Metalk

2k

4k

6k

8k

10k

12k

14k

16k

18k

20k

RHEL6.1 SAS Mixed Analytics Workload - Metal/KVMIntel Westmere EP 12-core, 24 GB Mem, LSI 16 SAS

SAS systemSAS Total

Tim

e t

o c

om

ple

te (

secs

)

94 %79 %

● Block Device Passthrough - SAS Workload

Page 79: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 80: Wagner w 420 Kvm Performance Improvements and Optimizations

NUMA (Non Uniform Memory Access)

● Multi Socket – Multi core architecture● NUMA is needed for scaling● RHEL 5 / 6 completely NUMA aware● Additional performance gains by enforcing NUMA placement

● How to enforce NUMA placement● numactl – CPU and memory pinning

Page 81: Wagner w 420 Kvm Performance Improvements and Optimizations

Memory Tuning - NUMA

# numactl --hardwareavailable: 8 nodes (0-7)node 0 cpus: 0 1 2 3 4 5node 0 size: 8189 MBnode 0 free: 7220 MBnode 1 cpus: 6 7 8 9 10 11node 1 size: 8192 MB...node 7 cpus: 42 43 44 45 46 47node 7 size: 8192 MBnode 7 free: 7816 MBnode distances:node 0 1 2 3 4 5 6 7 0: 10 16 16 22 16 22 16 22 1: 16 10 22 16 16 22 22 16 2: 16 22 10 16 16 16 16 16 3: 22 16 16 10 16 16 22 22 4: 16 16 16 16 10 16 16 22 5: 22 22 16 16 16 10 22 16 6: 16 22 16 22 16 22 10 16 7: 22 16 16 22 22 16 16 10

Internode Memory distance From SLIT table

Note variation in internode distances

Page 82: Wagner w 420 Kvm Performance Improvements and Optimizations

NUMA and Huge Pages -Not an issue with THP

20GB Guest using Huge Pages

20GB Guest using NUMA and Huge Pages

Physical Memory128G – 4 NUMA nodes

Huge Pages – 80G20G in each NUMA node

● Static Huge Page allocation takes place uniformly across NUMA nodes

● Make sure that guests are sized to fit

● Workaround 1 – Use Transparent Huge Pages

● Workaround 2 – Allocate Huge pages / Start Guest / De-allocate Huge pages

Page 83: Wagner w 420 Kvm Performance Improvements and Optimizations

OLTP Workload – Effect of NUMA and Huge Pages

Non NUMA NUMAK

200K

400K

600K

800K

1000K

1200K

1400K

OLTP workload - Multi Instance

Effect of NUMA

Ag

gre

ga

te T

ran

sact

ion

s p

er

Min

ute

8 %

Page 84: Wagner w 420 Kvm Performance Improvements and Optimizations

OLTP Workload – Effect of NUMA and Huge Pages

Non NUMA NUMA non NUMA + Huge Pages NUMA + Huge PagesK

200K

400K

600K

800K

1000K

1200K

1400K

OLTP workload - Multi Instance

Effect of NUMA and Huge Pages

Ag

gre

ga

te T

ran

sact

ion

s p

er

Min

ute

8 % 18 %12 %

Page 85: Wagner w 420 Kvm Performance Improvements and Optimizations

10U 20U 40UK

100K

200K

300K

400K

500K

600K

700K

Comparison between HugePages vs HugePages + affinity Four RHEL5.5 Guests using libvirt

huge pageshuge pages + vcpu set

Number of Users ( X 100 )

Ag

gre

ga

te T

rans

act

ion

s P

er

Min

ute

RHEL5.5 – KVM Huge Pages + CPU Affinity

Huge Pages Leveled off

Page 86: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Block I\O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 87: Wagner w 420 Kvm Performance Improvements and Optimizations

CPU Performance

● Improvements

● CPU Type

● CPU Topology

● CPU Pin - Affinity

Page 88: Wagner w 420 Kvm Performance Improvements and Optimizations

CPU Improvements

● RCU for the KVM module locks● we scale to 64 vcpus!

● Dynamic cancel ticket spinlocks

● Add user return notifiers in the kernel

● X2apic● Use MSR access to limit mmio accesses to the irq chip

Page 89: Wagner w 420 Kvm Performance Improvements and Optimizations

Specifying Processor Details

● Mixed results with CPU type and topology

● Experiment and see what works best in your case

Page 90: Wagner w 420 Kvm Performance Improvements and Optimizations

CPU Pinning - Affinity

● Virt-manager allows selection based on NUMA topology

● True NUMA support in the works

● Virsh pinning allows finer grain control

● 1:1 pinning● Good gains with pinning

Page 91: Wagner w 420 Kvm Performance Improvements and Optimizations

Unofficial SPECvirt Tiles vs Tuning - RHEL5.5

Baseline Huge Pages SR-IOV Node Binding0

2

4

6

8

10

12

14

88.8

10.6

12

Impact of Tuning KVM for SPECvirt

Not official SPECvirt Results

SPECvirt Tiles

Nu

mb

er

of

Tile

s

Based on presentation by Andrew Theurer at KVM Forum – August 2010

10%

50%

32%

Page 92: Wagner w 420 Kvm Performance Improvements and Optimizations

Performance monitoring tools

● Monitoring tools● top, vmstat, ps, iostat, netstat, sar, perf

● Kernel tools● /proc, sysctl, AltSysrq

● Networking● ethtool, ifconfig

● Profiling● oprofile, strace, ltrace, systemtap, perf

Page 93: Wagner w 420 Kvm Performance Improvements and Optimizations

Agenda● What is SPECvirt

● A quick, high level overview of KVM

● Low Hanging Fruit

● Memory

● Networking

● Block I/O basics

● NUMA and affinity settings

● CPU Settings

● Wrap up

Page 94: Wagner w 420 Kvm Performance Improvements and Optimizations

Wrap up

● KVM can be tuned effectively ● Understand what is going on under the covers● Turn off stuff you don't need● Be specific when you create your guest● Look at using NUMA or affinity● Choose appropriate elevators (Deadline vs CFQ)● Choose your cache wisely

Page 95: Wagner w 420 Kvm Performance Improvements and Optimizations

For More Information – Other talks

● Campground 2 – Thurs 10:20

● Performance Analysis & Tuning of Red Hat Enterprise Linux – Shak and Larry

● Part 1 - Thurs 10:20● Part 2 - Thurs 11:30

● Tuning Your Red Hat System for Databases● Sanjay Rao - Fri 9:45

● KVM Performance Optimizations

● Rik van Riel - Thurs 4:20

Page 96: Wagner w 420 Kvm Performance Improvements and Optimizations

For More Information

● KVM Wiki● http://www.linux-kvm.org/page/Main_Page

● irc, email lists, etc

● http://www.linux-kvm.org/page/Lists%2C_IRC● libvirt Wiki

● http://libvirt.org/● New, revamped edition of the “Virtualization Guide”

● http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/index.html

● Should be available soon !

Page 97: Wagner w 420 Kvm Performance Improvements and Optimizations

For More Information

● Reference Architecture Website● https://access.redhat.com/knowledge/refarch/TBD

● Andrew Theurers' original presentation● http://www.linux-kvm.org/wiki/images/7/7f/2010-forum-perf-and-

scalability-server-consolidation.pdf

● SPECvirt Home Page

● http://www.spec.org/virt_sc2010/● Principled Technologies

● http://www.principledtechnologies.com/clients/reports/Red%20Hat/Red%20Hat.htm

Page 98: Wagner w 420 Kvm Performance Improvements and Optimizations
Page 99: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt_sc2010 Published Results – May 2011

RHEL 5.5 (KVM) / IBMx3650 M3 / 12 cores

Vmware ESX 4.1 / HP D380 G7 / 12 cores

RHEL 6.0 (KVM) / IBM HS22V/ 12 cores

RHEL 5.5 (KVM) / IBMx3690 X5 / 16 cores

RHEL 6 (KVM) / IBMx3690 X5 / 16 cores

Vmware ESX 4.1 / HP BL620c G7 / 20 cores

RHEL6.1 (KVM) / HP BL620c G7 / 20 cores

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1169 1221

1367 1369

1763 1811 1820

SPECvirt_sc2010 2-Socket Results(x86_64 Servers) 5/2011

SPECvirt_sc2010 ScoreSPECvirt Tiles/Core

System

SP

EC

vir

t_s

c2

01

0 S

co

re

Til

es

/ C

ore

RHEL6

vmwareRHEL5

Page 100: Wagner w 420 Kvm Performance Improvements and Optimizations

SPECvirt_sc2010 Published Results – May 2011

Vmware ESX 4.1 / Bull SAS /32 cores

Vmware ESX 4.1 / IBMx3850X5 / 32 cores

Vmware ESX 4.1 / HP DL580 G7 / 40 cores

RHEL 6 (KVM) / IBMx3850 X5 / 64 cores

RHEL 6 (KVM) / IBMx3850 X5 / 80 cores

0

1000

2000

3000

4000

5000

6000

7000

8000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2721 2742

3723

5466

7067

SPECvirt_sc2010 2-4 Socket Results

(x86_64 Servers) 4/2011

SPECvirt_sc2010 ScoreSPECvirt Tiles/Core

System

SP

EC

vir

t_sc2

01

0 S

core

Til

es

/ c

ore