43
Integrating kdump into oVirt 3.5 Martin Peřina Software Engineer at Red Hat August 26 th 2014

Integrating kdump into oVirt

Embed Size (px)

Citation preview

Integrating kdump into oVirt 3.5

Martin PeřinaSoftware Engineer at Red Hat

August 26th 2014

Integrating kdump into oVirt 3.5 2/43

Agenda● Motivation

● What is kdump?

● What is fence_kdump?

● How is it all coupled together?

● Configuration

● Future features

Integrating kdump into oVirt 3.5 3/43

Motivation

Integrating kdump into oVirt 3.5 4/43

Host kernel crash on oVirt <= 3.4:1.host kernel crashed, process which gathers crash

information started (this process can take a lot of time)

2.after some time engine detected the host as non responsive and execute fencing on it

3. if host is fenced during crash gathering, all crash information are lost

Integrating kdump into oVirt 3.5 5/43

Goal for oVirt 3.5● Try to detect if host is not in kdump flow prior to fence

execution

● If host is in kdump flow, do not execute fencing and wait for host to gather its crash information successfully

Integrating kdump into oVirt 3.5 6/43

What is kdump?

Integrating kdump into oVirt 3.5 7/43

What is kdump?● kexec based kernel crash dumping mechanism (when

standard kernel crashed, capture kernel is booted)

● dumps memory content of crashed kernel into file on local or remote target

● dumping is executed from capture kernel, crashed kernel memory is preserved

● capture kernel needs reserved memory in standard kernel

Integrating kdump into oVirt 3.5 8/43

Standard and capture kernel

Integrating kdump into oVirt 3.5 9/43

How kdump works?1. Standard kernel crashes

2. Kexec boots capture kernel

3. Memory dump is executed in capture kernel

4. Memory dump file is stored to specified target

5. Host is rebooted

Integrating kdump into oVirt 3.5 10/43

Kdump configurationkdump configuration is stored in:

● /etc/kdump.conf

● static configuration that can be changed by administrator

● capture kernel initial ramdisk file

● created from /etc/kdump.conf on kdump service restart

Integrating kdump into oVirt 3.5 11/43

Sample kdump.confpath /var/crash

core_collector makedumpfile -l --message-level 1 -d 31

Integrating kdump into oVirt 3.5 12/43

Kdump requirements● kexec-tools package which contains tools to setup and

execute kdump

● crashkernel=MEM_SIZE command line parameter needs to be configured for standard kernel (on RHEL/Centos enabled by default, on Fedora administrator is required to enable it)

● kdump service has to be enabled

Integrating kdump into oVirt 3.5 13/43

What is fence_kdump?

Integrating kdump into oVirt 3.5 14/43

What is fence_kdump?● set of command line tools to receive messages from

dumping host on another predefined host

● part of fence-agents-kdump package

● it uses UDP protocol for messaging

● it uses port 7410 (can be changed)

● it sends messages each 10 seconds (can be changed)

Integrating kdump into oVirt 3.5 15/43

Kdump and fence_kdump/etc/kdump.conf contains two options to setup fence_kdump:

● fence_kdump_nodes

● list of hosts to send messages to

● fence_kdump_args

● additional parameters for fence_kdump_send

Integrating kdump into oVirt 3.5 16/43

kdump.conf with fence_kdumppath /var/crash

core_collector makedumpfile -l --message-level 1 -d 31

fence_kdump_nodes mperina.brq.redhat.com

fence_kdump_args -p 7410 -i 5

Integrating kdump into oVirt 3.5 17/43

fence_kdump limitations● fence_kdump destination host(s) have to be predefined and

they are part of capturing kernel initial ramdisk

● fence_kdump receiver can be used to determine if host is kdumping only for one host at the time and it cannot be used to determine if host finished kdumping

● fence_kdump messages are sent unencrypted using UDP protocol

● fence_kdump messages are not signed, sender can be identified only by source IP address

Integrating kdump into oVirt 3.5 18/43

How is it coupled together?

Integrating kdump into oVirt 3.5 19/43

oVirt kdump integration

Integrating kdump into oVirt 3.5 20/43

New fence_kdump listener● new standalone fence_kdump listener was implemented as

a part of oVirt kdump integration

● it can receive messages from multiple kdumping hosts at once

● it can determine that host finished kdumping using timeout from last received message

● it communicates with engine using engine database

● it's executed as a service on the same host as engine

Integrating kdump into oVirt 3.5 21/43

Integration – host deploy 1/3● kdump integration can be enabled for each host by setting

an option in Power Management tab of Host detail popup in webadmin

● host needs to be redeployed after kdump integration was enabled

● kdump integration is not bound to cluster level, it can be enabled even for < 3.5 cluster levels

Integrating kdump into oVirt 3.5 22/43

Integration – host deploy 2/3● during host deploy there are executed checks if kdump

integration can be enabled:

● host kernel has crashkernel=MEM_SIZE option set

● correct version of kexec-tools is available

● kdump destination address (engine FQDN) can be resolved

● if any of these checks are not successful, host deploy finishes successfully, but kdump integration is not configured and warning displayed

Integrating kdump into oVirt 3.5 23/43

Integration – host deploy 3/3● if all checks are successful

● fence_kdump options are updated in /etc/kdump.conf

● kdump service is restarted

● if kdump integration was not successfully configured during host deploy, administrator can fix the issues later manually and try to redeploy host again

Integrating kdump into oVirt 3.5 24/43

UI: New Host popup

Integrating kdump into oVirt 3.5 25/43

UI: Host Detail

Integrating kdump into oVirt 3.5 26/43

Host deploy part limitations● host deploy updates only fence_kdump options in

kdump.conf, other options are untouched

● administrator is responsible to manually set correct kdump target

Integrating kdump into oVirt 3.5 27/43

Integration – kdumping 1/2

Integrating kdump into oVirt 3.5 28/43

Integration – kdumping 2/2

Integrating kdump into oVirt 3.5 29/43

UI: Host start dumping

Integrating kdump into oVirt 3.5 30/43

UI: Host finished dumping

Integrating kdump into oVirt 3.5 31/43

Configuration

Integrating kdump into oVirt 3.5 32/43

fence_kdump listener configListener configuration is stored in text files:

● They need to have .conf suffix

● They have to be located under/etc/ovirt-engine/ovirt-fence-kdump-listener.d directory

● They are simple property based text files

Service restart is needed when config files were changed:

systemctl restart ovirt-fence-kdump-listener

Integrating kdump into oVirt 3.5 33/43

Listener config file sampleLISTENER_ADDRESS=0.0.0.0

LISTENER_PORT=7410

HEARTBEAT_INTERVAL=30

SESSION_SYNC_INTERVAL=5

REOPEN_DB_CONNECTION_INTERVAL=30

KDUMP_FINISHED_TIMEOUT=30

Integrating kdump into oVirt 3.5 34/43

fence_kdump listener options 1/3LISTENER_ADDRESS

● IP adress(es) that fence_kdump listener listens on

● It can contains either 0.0.0.0 (default) or one specific IP address

LISTENER_PORT

● port that fence_kdump listener listens on (default 7410)

Integrating kdump into oVirt 3.5 35/43

fence_kdump listener options 2/3HEARTBEAT_INTERVAL

● Defines the interval in seconds (default 30) of listener's heartbeat updates to database

SESSION_SYNC_INTERVAL

● Defines the interval in seconds (default 5) to synchronize listener's host kdumping sessions in memory to database

Integrating kdump into oVirt 3.5 36/43

fence_kdump listener options 3/3REOPEN_DB_CONNECTION_INTERVAL

● Defines the interval in seconds (default 30) to reopen database connection which was previously unavailable

KDUMP_FINISHED_TIMEOUT

● Defines maximum timeout in seconds after last received message from kdumping hosts after which the host kdump flow is marked as FINISHED

Integrating kdump into oVirt 3.5 37/43

fence_kdump engine config 1/4● fence_kdump options which are not related to listener are

stored in database and they can be changed using engine‑config tool

● it's required to restart ovirt-engine (and sometimes also redeploy hosts) when these values were changed

Integrating kdump into oVirt 3.5 38/43

fence_kdump engine config 2/4FenceKdumpDestinationAddress

● Defines the hostname(s) or IP address(es) to send fence_kdump messages to

● If empty (default), engine FQDN is used

FenceKdumpDestinationPort

● Defines the port (default 7410) to send fence_kdump messages to

Integrating kdump into oVirt 3.5 39/43

fence_kdump engine config 3/4FenceKdumpMessageInterval

● Defines interval in seconds (default 5) between messages sent by fence_kdump

FenceKdumpListenerTimeout

● Defines max timeout in seconds (default 90) since last heartbeat to consider fence_kdump listener alive.

Integrating kdump into oVirt 3.5 40/43

fence_kdump engine config 3/4KdumpStartedTimeout

● Defines maximum timeout in seconds (default 30) to wait until 1st message from kdumping host is received (to detect that host kdump flow started)

Integrating kdump into oVirt 3.5 41/43

Future features

Integrating kdump into oVirt 3.5 42/43

Future features● Extend kdump to send it's flow status as a part of

fence_kdump message (starting, dumping, finished, error, ...)

● Extend fence_kdump protocol to:

● use message sequence number

● include unique host id (not to rely just on IP address)

● include HMAC signature for message

Integrating kdump into oVirt 3.5 43/43

THANK YOU !

[email protected] at #ovirt (irc.oftc.net)