153
S6 — Nagios: Advanced Topics or Non-Obvious Nagios John Sellens [email protected] USENIX LISA 27, 2013 November 3, 2013

S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

  • Upload
    others

  • View
    29

  • Download
    0

Embed Size (px)

Citation preview

Page 1: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics

orNon-Obvious Nagios

John Sellens

[email protected]

USENIX LISA 27, 2013

November 3, 2013

Page 2: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics

Contents

Preamble and Introduction 3

Nagios Basics 9

Nagios Past, Present and Future 20

Nagios Plugins 31

More on Configuration 38

Theory and Practice 71

Getting Larger 86

Tips and Tricks 92

c©2003-2013 John Sellens USENIX LISA 27, 2013 1

S6 — Nagios: Advanced Topics

Abusing Nagios 103

Plugin Pointers 121

Nagios Addons 123

Wrap Up 150

c©2003-2013 John Sellens USENIX LISA 27, 2013 2

Page 3: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Preamble and Introduction

Preamble and Introduction

c©2003-2013 John Sellens USENIX LISA 27, 2013 3

Page 4: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Preamble and Introduction

Overview

• Nagios is well-established and widely used

– Over a million known servers running 3.X

• We’re going to look at some of the non-obvious bits

– Alternate title: Non-Obvious Nagios

• And ways to extend Nagios

• We’ll assume a basic knowledge of Nagios and what it does

• We’ll look at actual Nagios data

c©2003-2013 John Sellens USENIX LISA 27, 2013 4

Notes:

• I’m assuming you’ve already chosen Nagios for your environment

– Or you’re very careful when making decisions and don’t want to

rush into anything

• Or at least I hope what we’re covering is non-obvious and/or non-trivial

• The server count is from Ethan Galstad’s talk at Ohio LinuxFest 2010 —

servers that check for available updates.

• I sure hope the network and my laptop are both happy . . .

• Both USENIX and I will very much appreciate your feedback — please fill

out (and return) the evaluation form later today.

Page 5: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Preamble and Introduction

Outline/Timetable

• Preamble / Introduction / Outline

• Basics – Build, Install, Configure, Run

• Plugins, Configuration Details

• — Break — 10:30 to 11:00

• Theory and Practice

• Getting Larger

• Add-ons and extensions

• Tips, tricks, etc.

• Wrap up – 12:30pm

c©2003-2013 John Sellens USENIX LISA 27, 2013 5

Notes:

• Scheduled for 9:00 - 12:30pm with one half hour break

• Tutorial lunch is from 12:30-1:30 I think

• Feel free to ask questions anytime, or ask for clarification, or add some

information. Interactivity is good!

Page 6: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Preamble and Introduction

Questions?

• Got a Question?

• A Clarification?

• Some Confusion?

• A Point of Interest?

• Ask!

c©2003-2013 John Sellens USENIX LISA 27, 2013 6

Notes:

• This slide is here to be even more explicit that questions and comments

are more than welcome, and that interactivity is good.

• Get my attention through any appropriate means, but if you’re throwing

something, please lob, and keep it light.

• Though please consider the time we have available before you start on a

long, involved anecdote of what once happened to a friend of yours.

Page 7: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Preamble and Introduction

About the Instructor

• John Sellens

• 25+ years as UNIX system administrator

• University of Waterloo, UUNET Canada, Certainty Solutions

Canada, SYONEX, FreshBooks, . . .

• “Running the Numbers: System, Network, and Environment

Monitoring”, co-author with Dan Klein; “System and Network

Administration for Higher Reliability”, lapsed ;login: author,

• Previous LISA PC, long time USENIX and LISA attendee

• Nagios World 2012 and 2013, OLF 2010, PICC 2011

c©2003-2013 John Sellens USENIX LISA 27, 2013 7

Notes:

• FreshBooks is a cloud accounting app, and normally I would be there

instead of here

• Feel free to contact me here or by email if you have any questions

Page 8: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Preamble and Introduction

Viewpoints and Religion

• Monitoring is Exceptions, Trending, History

• UNIX philosophy: Effective tools, not kitchen sink

– Choose the best tool(s) for the job

• SNMP is Your Friend

– Use it whenever you can

• Solve any problem in computer science with another level of

indirection

c©2003-2013 John Sellens USENIX LISA 27, 2013 8

Page 9: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Nagios Basics

c©2003-2013 John Sellens USENIX LISA 27, 2013 9

Page 10: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Nagios Basics

• Nagios is a host and service monitor

• Web GUI and display interface

• Extensible, featureful

• Built from components, rather than one big blob

– Well defined interfaces between components

• Very well documented and supported

– Version 3 documents are HTML or a 358 page PDF

• Very widely used

c©2003-2013 John Sellens USENIX LISA 27, 2013 10

Notes:

• http://www.nagios.org/

• Since 1999: http://www.nagios.org/about/history

• http://demos.nagios.com/

• New version 4

– Nagios 4.0.0 was released September 20, 2013

• Established version 3

– Nagios 3.5.1 was released August 30, 2013

– Nagios 3.0 was released March 13, 2008

• By Ethan Galstad; Licensed under the GPL

• Nagios R© is a registered trademark of Ethan Galstad

• Nagios Enterprises http://nagios.com/

– Commercial support and enhanced versions

Page 11: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Nagios Features

• Monitors services: SMTP, HTTP, etc.

• Monitors system state: CPU, memory, disk, etc.

• Plugin model — easily extensible service checks

• Efficient service check engine

• Host and service dependencies

• Highly configurable notifications and escalations

• Event handlers to automate problem resolution

• Very effective web interface

• And more . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 11

Page 12: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Nagios Implementation

• Nagios itself is a polling and reporting engine

• All service and host checks are via external “plugins”

– There’s a wide variety of plugins available, and it’s easy to

write more

– Notifications are external commands as well

• The Nagios core engine is written in C

• As are the CGIs that implement the web interface

• The plugins are variously written in shell, perl, C, etc.

c©2003-2013 John Sellens USENIX LISA 27, 2013 12

Notes:

• There is perpetually ongoing talk of how the CGIs would be much better

in PHP

– But see the mentions of some of the addons starting at page 144

Page 13: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Nagios Overview

Stolen from Ethan Galstad

c©2003-2013 John Sellens USENIX LISA 27, 2013 13

Notes:

• Shamefully stolen from Ethan Galstad’s FOSDEM 2005 presentation

http://www.nagios.org/fosdem2005

Page 14: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Operational Overview

• Nagios runs as a daemon

• Scheduling and executing host and service checks

• Running notification commands as required

• Accepting external commands and updates

• Web interface provides user interface

• Mechanisms for hooks and extensions

c©2003-2013 John Sellens USENIX LISA 27, 2013 14

Page 15: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Building Nagios

• The typical build process, with a few tweaks

• Lots (and lots) of configure options . . .

• Is --enable-embedded-perl a good idea?

– Gone in version 4

• Likely want --enable-event-broker

• Don’t forget make install-commandmode

– Creates the external command file (named pipe)

• Setup Apache with the example conf entries

c©2003-2013 John Sellens USENIX LISA 27, 2013 15

Notes:

• Quickstart Installation Guides

http://nagios.sourceforge.net/docs/3_0/quickstart.html

• Embedding a Perl interpreter just seems not quite right to me

– But see the documentation for pros and cons:

http://nagios.sourceforge.net/docs/3_0/embeddedperl.html

– Removed in version 4

• make install-webconf might do the Apache config for you

• make install-init may set up your boot script

• make fullinstall may do it all for you

• I use the FreeBSD port . . .

• Most people likely just install the “standard packages”

Page 16: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Running Nagios

• Nagios is typically started at boot time, and mostly left running

– Needs to be restarted to pick up any configuration changes

– A sigHUP will restart if your configs are perfect

• It comes with a handy daemon-init script that does start, stop,

reload

• Before re-starting Nagios for a configuration change, verify it

first:

nagios -v nagios.cfg

• This will tell you about problems before you start breaking things

c©2003-2013 John Sellens USENIX LISA 27, 2013 16

Notes:

• If your configuration contains errors (as shown by nagios -v then a

restart or a kill -HUP will leave you with no running Nagios

Page 17: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Not Running Nagios?

• Who watches the watcher?

• In the past, the cgi.cfg setting nagios_check_command

checked that nagios is running

• Not any more!

• If the status.dat file is left around, and nagios is dead,

nothing notices

– As far as I can see . . .

– CGIs are happy with a days old status.dat

• Run check_file_age from cron?

c©2003-2013 John Sellens USENIX LISA 27, 2013 17

Notes:

• At least up to version 3.4.1

• In practice, I’ve never seen this happen, but it’s good to be paranoid

• I don’t know how best to solve this

• I think it should be addressed in the CGIs at least

• Or perhaps cfengine, puppet, etc. will fix

Page 18: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Nagios Web Interface

• CGIs and HTML implement a web interface to Nagios

• Provides access to a variety of commands

• Some ask “why isn’t it PHP?”

• Some ask “why isn’t it database backed?”

• Documented as “optional”

– Which highlights the separate components

c©2003-2013 John Sellens USENIX LISA 27, 2013 18

Notes:

• The different CGIs, what they do, and authorization requirements, etc.

are described very effectively at

http://nagios.sourceforge.net/docs/3_0/cgis.html

• There are efforts underway to re-write/re-implement the web interface.

• More information on front ends later starting at page 144.

Page 19: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Basics

Configuration Basics

• Arbitrary probes, flexible parameters. macro substitution

• Time-variable behaviours to cover work-day vs off-hours issues

• Template based, inheritable, grouping, etc.

• Consistent and well-documented

• Some tools try to provide a web interface to the configs

• Three required files:

– nagios.cfg — overall configuration, refers to other files

– resource.cfg — global variables and database access

– cgi.cfg — controls web interface behaviour and access

c©2003-2013 John Sellens USENIX LISA 27, 2013 19

Notes:

• Some people say: “it should be in a database”

• I like text files that I can manipulate or generate

• Version 3 cleans up a bunch of configuration “issues” and makes things

much better

– Not that things were bad before, but they are even better now

Page 20: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

Nagios Past, Present and Future

c©2003-2013 John Sellens USENIX LISA 27, 2013 20

Page 21: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

Nagios Past, Present and Future

• Version 2.0 was released early 2006

– Regular expression matching for host, hostgroup and service

names in various object definitions

– Code and logic enhancements

• Version 3.0 was released March 2008

– More features, extensibility

– Enhancements for large sites and scalability

• Version 4.0 was released September 2013

• Steady, ongoing development

c©2003-2013 John Sellens USENIX LISA 27, 2013 21

Notes:

• Various companies have sponsored work on the project

• It may be a little early to use version 4 in full production, but it looks pretty

good, and has been in development for quite a while

• For new installations, you should likely use version 3, unless there is a

particular tool you need that does not work with version 3

• I’m going to assume we’re using version 3, unless I say otherwise

Page 22: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

New Features in Nagios 2.0

• Adaptive monitoring

– Modify on the fly how, what and when things are monitored

and processed

– e.g. Change which services monitored when hosts failover

• Event Broker API

– Loadable modules can process event and status data in real

time

– e.g. Process performance data, or insert into a database

• nagiostats command gets processing stats from the running

nagios

• Some regular expressions can be used in object definitions

c©2003-2013 John Sellens USENIX LISA 27, 2013 22

Notes:

• Version 2 documentation is likely no longer available online

Page 23: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

New Features in Nagios 3.0

• Much improved host check logic and parallelism

• External commands can use regular files (not the pipe)

• Multiple template inheritance, object definition updates

• Performance improvements, esp. for large sites

• Multiline plugin output

• More consistent state information storage

• Better retention across restarts

• And more!

c©2003-2013 John Sellens USENIX LISA 27, 2013 23

Notes:

• Big host check performance gains

• More detail at

http://www.nagios.org/development/history/

• What’s new in 3 at:

http://nagios.sourceforge.net/docs/3_0/whatsnew.html

Page 24: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

New Features in Nagios 4.0

• Performance, scalability, interfaces

• Query Handler interface for checks and more

– Core Worker processes run checks and report to Core

∗ Fewer fork()s, smaller, faster

– Nagios Event Radio Dispatcher (NERD) provides event

streams

• Internal speedups – config check, event queue, macro search

• Configuration – host address optional, service parents

• Host/service/contact values – no alerts for “minor” problems

• No more embedded Perl

c©2003-2013 John Sellens USENIX LISA 27, 2013 24

Notes:

• Announced: Nagios Core 4.0.0 Now Available

http://labs.nagios.com/2013/09/20/nagios-core-4-now-available/

• What’s new in 4 at:

http://nagios.sourceforge.net/docs/nagioscore/4/en/whatsnew.html

• I have not spent a lot of time with the 4.0 release (yet!)

– Released the Friday before the Monday these notes were due

– But seemed easy enough to get going in my quick testing

• Core Worker processes default to 1.5 x #CPUs, minimum 4

• I think service parents are effectively just easier to configure service de-

pendencies

• I suspect that host/service values will be complicated to use effectively

– Though relatively easy to only notify a manager if more than a few

things are broken

Page 25: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

Upgrading to Version 4

• In general, v4 is upwardly compatible with v3

– Most major changes are internal and less user visible

• Go through nagios.cfg and cgi.cfg for changes

– Or replace with current and backport your changes

• Some settings are now deprecated

– A few errors previously acceptable now fixed

• Some minor changes to macros

• Read the docs

c©2003-2013 John Sellens USENIX LISA 27, 2013 25

Notes:

• http://nagios.sourceforge.net/docs/nagioscore/4/en/upgrading.html

Page 26: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

What is This Icinga Thing?

• Icinga is a fork of Nagios

• Perception: Nagios single developer model

• Perception: Nagios Enterprises becoming more “corporate”

• Perception: Progress too slow

• Reality: Maybe not so clear cut

• Stated intention: Maintain compatibility

• Announced May 2009

c©2003-2013 John Sellens USENIX LISA 27, 2013 26

Notes:

• http://www.icinga.org/

Page 27: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

What is This Shinken Thing?

• A Nagios-like tool, redesigned and rewritten from scratch

• Discovery, GUI

• Python

• Compatible configuration

c©2003-2013 John Sellens USENIX LISA 27, 2013 27

Notes:

• http://www.shinken-monitoring.org/

Page 28: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

Predicting the Future

• Nagios now has more core developers

• Many enhancement projects

– Commercial and non-commercial

• Obvious examples of outreach and community building

– e.g. ideas.nagios.org

• Making the Nagios project/product sustainable?

c©2003-2013 John Sellens USENIX LISA 27, 2013 28

Notes:

• Development Roadmaps are currently a little sparse

http://wiki.nagios.org/index.php/Development_Roadmaps

• My view is that the result of going corporate (Nagios Enterprises) is a

good thing

– Momentum, support, choices, new tools, . . .

Page 29: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

What is this Nagios XI Thing?

• The Next Generation of Nagios

• An integrated, supported commercial product

• New web interface

• New web configuration interface

• An integrated all-in-one Nagios

– Includes NagiosQL and PNP, plus more!

• Comparable to GroundWork, Opsview, Op5, Centreon, . . . ??

c©2003-2013 John Sellens USENIX LISA 27, 2013 29

Notes:

• Announced in mid to late 2009, available since early 2010 or so

• I haven’t looked closely at it

– I’m lazy

– And cheap

– And I like my vi’d config files

Page 30: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Past, Present and Future

Nagios Fusion

• Distrbuted monitoring solution

– Centralized view of de-centralized Nagios servers

– Links to remote servers for “drill down”

• Aggregates tactical overview

• Simple configuration through the web

• Should scale well

– Another level of indirection . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 30

Notes:

• http://www.nagios.com/products/nagiosfusion

• Commercial product

• Works with both Nagios XI and Nagios Core

• Coming soon . . .

Page 31: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Nagios Plugins

c©2003-2013 John Sellens USENIX LISA 27, 2013 31

Page 32: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Nagios Plugins

• All Nagios host and service checks are performed by external

“plugins”

– Hand wave away any question about plugins run with

embedded Perl interpreter

• A separate nagiosplug development team

– Standard syntax and output

– Consistent coding standards and processing

• The nagiosplug distribution has helpers for your own plugins

– Such as Nagios::Plugin for Perl

c©2003-2013 John Sellens USENIX LISA 27, 2013 32

Notes:

• http://nagiosplugins.org/

• Version 1.4 February 3, 2005; 1.4.16 June 27, 2012

• i.e. Pretty stable at this point

Page 33: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Theory of Plugins

• A plugin command will be invoked by Nagios as required, with

arguments as specified in the command definition that was used

• Standard options include

– who to check: --hostname= or -H

– where to check: --ipaddress= or -I

– the port to check: --port= or -P

– critical error level: --critical= or -c

– warning level: --warning= or -w

– provide usage help: --help or -h

• Critical and warning levels are in units that make sense for the

plugin being used

c©2003-2013 John Sellens USENIX LISA 27, 2013 33

Notes:

• The “ROADMAP” file in the current nagiosplug source provides additional

information

Page 34: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Theory of Plugins (cont’d)

• Plugins return an informative message, and an exit code

– Limited to about 350 characters of message (pre-V3)

– The message is displayed in the web status interface

• Exit codes indicate current state

– 0 OK

– 1 WARNING

– 2 CRITICAL

– 3 UNKNOWN

• Nagios reacts based on exit code

– Invokes notifications, exception handlers

• Optional “performance data” is for statistics collection

c©2003-2013 John Sellens USENIX LISA 27, 2013 34

Notes:

• Version 3 allows long (4,000 character), multi-line output from plugins

– I think first line goes in web display

– All output available via macros

• Plugins may also return “performance data” by appending an or-bar (“|”)and “key=value” information to the message

• Performance data can be processed via appropriate settings in the na-

gios.cfg file

– So you can stuff your plugin results into a database, or a graph,

etc.

Page 35: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Plugin Extra-Opts

• C plugins can consult a run-time config file

check_whatever --extra-opts=[section][@file]

• Config file is “ini” style

[section]

key=value

• File defaults to plugins.ini or nagios-plugins.ini

• Searched for in $NAGIOS_CONFIG_PATH or various /etc dirs

• Configure with --enable-extra-opts

c©2003-2013 John Sellens USENIX LISA 27, 2013 35

Notes:

• http://nagiosplugins.org/extra-opts hmm - wish there was per-host option

• I don’t think you can do per-hostaddress settings within the file itself

– Though you could have different sections named for a macro used

on the command line

• Extra-opts capability was added in 1.4.12

• Early on I didn’t find documentation on how to find the file, so I looked at

the code — fixed now

• The code to file the config file seems a little convoluted

• Ethan Galstad pointed out that this is handy for hiding userids and pass-

words and other secrets from the ps command

Page 36: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Plugin Development

• It’s very easy to write your own plugins

– A shell script that checks a file or process and returns an exit

code is dead easy

• Wrappers around existing data

– Another level of indirection . . .

• For Perl: Nagios::Plugin module

• Performance/speed can be an issue as you monitor more

services

• Try it yourself!

c©2003-2013 John Sellens USENIX LISA 27, 2013 36

Notes:

• Nagios::Plugin comes with the nagios plugins distribution

– Configure with --enable-perl-modules

• Nagios Plugin API: http://nagios.sourceforge.net/docs/3_0/pluginapi.html

• I tried it myself: snagtools plugins collection

www.syonex.com/resources/software.html

Page 37: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Plugins

Existing Plugins

• There are many, many plugins already available

• Typically divide into local and remote checks

– Local checks something on the Nagios server

– Remote checks use SNMP or connect to a remote service

port to check status

• Mechanisms for executing “local” plugins on remote machines

– e.g check_by_ssh, NRPE

– Religion: I check remote machine state via SNMP

c©2003-2013 John Sellens USENIX LISA 27, 2013 37

Notes:

• More on NRPE later

• That may be my religion, but some times I am a heretic

• See exchange.nagios.org

– Lots of tools and plugins there

– Pointers to various places

• We’ll discuss plugin efficiency and overhead later

Page 38: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

More on Configuration

c©2003-2013 John Sellens USENIX LISA 27, 2013 38

Page 39: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Configuration Details

• Recall: Text files, 3 core files plus object definitions

• Recall: Read and digested when Nagios starts

• Recall: Can be a little complicated at first glance

– Likely only because there are so many possibilities

– But consistent and well documented

• Let’s look at it in some more depth

c©2003-2013 John Sellens USENIX LISA 27, 2013 39

Notes:

• Version 3 allows pre-digesting configs before startup

– Speeds startup time in large environments

• Version 4 config parsing is much faster and better

Page 40: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Required Files

• nagios.cfg resource.cfg and cgi.cfg

– nagios.cfg location is specified on nagios command line

– resource.cfg location is set in nagios.cfg

– nagios.cfg location is set in cgi.cfg

– cgi.cfg location is compiled into the CGIs

• Syntax for these: variable = value

• Variables are case-sensitive

c©2003-2013 John Sellens USENIX LISA 27, 2013 40

Page 41: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

nagios.cfg Settings

• Configures overall Nagios operation

• File and directory locations

• Timing intervals and timeouts

• Log rotation, and logging operations

• Service check scheduling options

• Administrator email and pager addresses

• And more . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 41

Notes:

• The documentation has all the details

• Named nagios.cfg by convention — the startup script explicitly refers to

the file

• Of course, going with the flow, and naming your file “nagios.cfg” will make

your life easier

Page 42: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

nagios.cfg Settings (cont’d)

• Worth a particular mention is check_result_reaper_frequency

• It tells Nagios how often (in seconds) to gather results from

service checks

– Default setting is every 10 seconds

• Child processes (service check plugins) hang around until they

are “reaped”

• If you’re doing a non-trivial number of service checks, setting

this lower will (typically)

– Reduce the number of processes waiting, taking up space

– Lower the local load average numbers, sometimes

c©2003-2013 John Sellens USENIX LISA 27, 2013 42

Notes:

• Mostly obsolete in version 4 due to the workier process implementation

• In version 2, check_result_reaper_frequency was called

service_reaper_frequency

• Load average depends on how processes blocked on I/O are counted on

your particular system

– I think

Page 43: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

resource.cfg Settings

• Not actually required, and you can have multiple resource files

• Can have restrictive permissions to hide secrets from the web

interface

• Defines up to 32 $USERx$ macros for substitution into

commands

– $USER1$ — path to the plugins

– $USER2$ — path to event handlers (if any)

– Various other substitutions

c©2003-2013 John Sellens USENIX LISA 27, 2013 43

Notes:

• And resource files don’t all have to be called resource.cfg

• I think it’s actually 256 $USERx$ macros now, but the documentation is

inconsistent (when last I looked)

• Database connectivity information won’t do you much good if you didn’t

build Nagios to use databases to store various bits of information

Page 44: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

cgi.cfg Settings

• Used by the web interface

• File locations, including a pointer to nagios.cfg

• User access controls, permissions, guest userid

• How to check if Nagios is running

• Extended host and service information for identifying images

and status map coordinates

– Or references to config files that contain that information

c©2003-2013 John Sellens USENIX LISA 27, 2013 44

Notes:

• The “How to check if Nagios is running” no longer exists/works as far as I

can tell

Page 45: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Object Configuration

• nagios.cfg specifies the locations of “object configuration files”

– With the cfg_file and cfg_dir variables

– Which can be repeated to refer to multiple files and

directories

– cfg_dir directories are recursive

• Hosts, services, contacts, etc. are defined in template-based

definitions in those files

• Template inheritance provides a reasonably effective

mechanism

c©2003-2013 John Sellens USENIX LISA 27, 2013 45

Notes:

• cfg_dir recursion was added in version 2

Page 46: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Object Configuration (cont’d)

• Object definitions look like

define type {

directive value

directive value

...

}

• Definitions include a “type_name” directive and value

• Templates include a “name” directive and value, and the

directive “register” with a value of “0”

• To inherit from a template, a definition includes a “use” directive

with a template name as the value

c©2003-2013 John Sellens USENIX LISA 27, 2013 46

Notes:

• Directive names are case-sensitive

• Comments with # in column 1 or semi-colon anywhere

• Newer documentation uses “directive” and “variable” interchangeably

• It sounds a little more complicated than it really is

• You can also inherit from a “registered” object, such as some other host

or service

– e.g. host1 is fully defined, host2 is “just like host1”

• See “Object Inheritance” in the docs

http://nagios.sourceforge.net/docs/3_0/objectinheritance.html

Page 47: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Object Templates

• The “use” directive causes a definition to inherit the directives

declared in a template definition

• You can only have one “use” directive in a given template

– Its location in the template is irrelevant, anything local to the

template overrides anything inherited from a “use” directive

• Value for “use” directive is comma-separated list

– You can have a “tree” of definition inheritance from a

common root

– Or multiple roots

• Reasonably powerful . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 47

Notes:

• Version 3 added multiple template inheritance

– In earlier versions you could inherit from only one template

– Though you could have a template “chain”

• I think template inheritance must take the form of an acyclic directed

graph.

– There — my math degree proves useful once again

Page 48: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Object Template Inheritance

• Host devweb1 definition contains use 1, 4, 8

• First template (1 in this case) has highest priority

c©2003-2013 John Sellens USENIX LISA 27, 2013 48

Notes:

• Stolen from Nagios 3 documentation

• Multiple inheritance sources were added in version 3

• Very handy with multiple locations, variable overrides, etc.

Page 49: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Directives and Values

• Each object type has pre-defined directive names

– Generally consistent across different object types

• Append to an inherited value: directive +value

• Delete inherited value: directive null

c©2003-2013 John Sellens USENIX LISA 27, 2013 49

Notes:

• Appending with + is called additive inheritence

• No subtractive inheritence i.e. can’t remove an item from a list unless you

re-set the list

Page 50: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Custom Directives

• Custom directives start with underscore

– And are case IN-sensitive

– e.g. _snmp_community, etc.

• Use in host, service and contact definitions

• Refer to as macros or environment variables

– e.g. _bloop in a host definition becomes

∗ macro $_HOSTBLOOP$

∗ environment variable NAGIOS_ _HOSTBLOOP

c©2003-2013 John Sellens USENIX LISA 27, 2013 50

Notes:

• The documentation calls these custom variables, not directives

• Note that the macros and environment variables are uppercase

• And similarly for SERVICE and CONTACT custom variables

• I don’t know if you can use custom directives in other objects, or how you

would refer to them

• More on macros and environment variables later

Page 51: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Implied Inheritance

• Nagios will sometimes assume a value from a related object

• Service objects will inherit

– contact_groups, notification_interval, notification_period

from the associated host

• Hostescalations and serviceescalations will similarly inherit as

well

– Except notification_period becomes escalation_period

c©2003-2013 John Sellens USENIX LISA 27, 2013 51

Notes:

• Which is convenient and makes sense, and saves keeping the same in-

formation consistent in multiple places

Page 52: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Object Types

• There are quite a few different object types that can be defined

• host, hostdependency, hostescalation

• hostgroup

• contact, contactgroup

• service, servicegroup, servicedependency, serviceescalation

• hostextinfo, serviceextinfo

• timeperiod

• command

c©2003-2013 John Sellens USENIX LISA 27, 2013 52

Notes:

• I think “quite a few” equals 14

• More or less self explanatory

• The “extinfo” types provide “extended” information for hosts and services

for the web interface

• hostgroupescalation removed in 2.x — you can now use hostgroup_name

in hostescalation definitions

• 2.x added servicegroup primarily for CGI display purposes

• Servicegroup can be referred to by servicedependency and serviceesca-

lation definitions

• hostextinfo and serviceextinfo are now deprecated

– All directives are now part of host and service definitions

Page 53: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Object Definitions

• Object definitions have many possible directives

– You’ll default many, and inherit many from master templates

• The directives are fairly consistent across different object types

• The sample configuration files are well-documented

• The samples used to be in different files by object type

– Not necessary to split them up that way

– Many find that a “cfg_dir” full of .cfg files is very convenient

• Order is unimportant — multiple files are treated the same as a

single file

c©2003-2013 John Sellens USENIX LISA 27, 2013 53

Notes:

• You should read/review the sample config files

• cfg_dir is recursive as of 2.x

– Which is handy

Page 54: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Timeperiod Definitions

• Timeperiods are used to define when to do service checks, and

when notifications can be sent

define timeperiod{

timeperiod_name nonwork

alias Time to be Not Working

sunday 00:00-24:00

monday 00:00-09:00,18:00-24:00

}

• Most object types have a “type_name” directive, which is used

by other objects to refer to the object being defined, and for

some display purposes

• “alias” defines a more verbose description of the object

c©2003-2013 John Sellens USENIX LISA 27, 2013 54

Notes:

• Apparently we work 24 hours a day from Tuesday to Saturday

• Most object types also have an “alias” directive

• http://nagios.sourceforge.net/docs/3_0/timeperiods.html

Page 55: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Timeperiod Definitions (cont’d)

• Version 3 added lots of timeperiod features

– Dates and date ranges, day of month

– Offset weekday e.g. 3rd Monday of a month or all months

– And more!

• Exclude one timeperiod from another with the exclude directive

• Very powerful, handy for scheduling or recurring windows,

holidays, vacations, etc.

• Non-weekday definitions are called “exceptions”

– Which seems confusing to me

c©2003-2013 John Sellens USENIX LISA 27, 2013 55

Notes:

• Lots and lots of examples at

http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#timeperiod

• I think you can do just about anything

– Including making it completely incomprehensible

• Version 3 has some timeperiod exclusion bugs, fixed in version 4

Page 56: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Command Definitions

• All service and host checks are performed by commands

defined in command definitions

define command{

command_name check_tcp

command_line $USER1$/check_tcp

-H $HOSTADDRESS$ -p $ARG1$

}

• Note the use of variable substitution to pass parameters to the

actual command

• A service definition that is checking for the gopher port to be

listening would use

check_command check_tcp!70

c©2003-2013 John Sellens USENIX LISA 27, 2013 56

Notes:

• I wrapped the command_line to fit, but you can’t do that in a real definition

• There are a number of macros defined based on the values of directives

in definition types

– We’ll touch on macros later

• $HOSTADDRESS$ is set from the “address” directive in the appropriate

host definition

• Command line quoting is sometimes challenging, so try to avoid special

characters in your arguments

• Do you remember gopher?

Page 57: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

contact Definitions

• A contact is a person that may be notified, or who has access to

the web interface

• host_notification_period and service_notification_period take a

timeperiod_name

• host_notification_options are d,u,r,f,s,n for down, unreachable,

recovery (up), flapping start/stop, scheduled downtime

start/stop, or none

• service_notification_options are w,u,c,r,f,s,n for warning,

unknown, critical, etc.

• The email, pager, host_notification_commands, and

service_notification_commands directives are “obvious”

c©2003-2013 John Sellens USENIX LISA 27, 2013 57

Notes:

• I think by now you probably understand the definition syntax

• Notification periods define when you can notify the person

• I’ve left out the contact_name and alias directives

• Notification options are comma-separated lists of code letters

• You might not want to include unreachable by default

– If a key router, switch or firewall goes down, you’ll get a lot of noise

– As long as you’ve properly defined host parent/child relationships

Page 58: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

contactgroup Definitions

• Think of a contactgroup as a work team that is responsible for

some hosts and/or services

– It provides a level of indirection or abstraction in the

configuration of areas of responsibility

• You might also define a “managers” contactgroup to be used in

escalations

• The members directive is a comma separated list of

contact_names of people in this group

c©2003-2013 John Sellens USENIX LISA 27, 2013 58

Notes:

• Recall that all problems in computer science can be solved by another

level of indirection

Page 59: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

host Definitions

• A host definition specifies a host or device which provides

“services”

• A host has both a host_name and an address

– address can be an IP address or FQDN

– defaults to host_name in v4

– An IP address avoids alerts if DNS fails, but is harder to

maintain

• The check_command is used to see if a host is up

– Typically a “ping” test of some form

– Only used if service checks fail

• parents — a list of routers, gateways between here and there

c©2003-2013 John Sellens USENIX LISA 27, 2013 59

Notes:

• Remember to define the check_command (somewhere!) otherwise your

host checks will show as “pending”

– Depending on firewalls, sometimes pings won’t work and I use

check_ssh as the check_command

• The “parents” directive lets you describe your network topology, so that if

a network link goes down, you’ll get notified about the link, not that all the

unreachable hosts are down

• An unreachable host can cause a “route verification” to take place

– If I was a marketer, I would say “root cause analysis” here

Page 60: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Groups for Hosts

• Hosts are included in hostgroups via

– The hostgroups directive in the host definition

– The members directive in hostgroup definitions

• contact_groups directive is per-host, not per-hostgroup

– Consistent with use in service definitions

– Need to be a contact for all hosts in a hostgroup to have

access to the hostgroup

c©2003-2013 John Sellens USENIX LISA 27, 2013 60

Notes:

• Contact groups used to be defined for a hostgroup

• I think this means that hostgroups and servicegroups are more or less

equivalent

– Just different names for grouping things you want to group

Page 61: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

hostgroup Definitions

• A hostgroup defines an administrative grouping of hosts

• Used to organize output in the web interface, and to determine

access restrictions

– You can only access those hosts/services that you are

responsible for

• members lists the hosts that are members

• hostgroup_members includes other hostgroups in this one

c©2003-2013 John Sellens USENIX LISA 27, 2013 61

Notes:

• Can have multiple members directives for convenience

– I think

• Contacts used to be attached to hosts only by the old contact_groups

directive in hostgroups

Page 62: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

service Definitions

• In Nagios terms, a “service” could be an aspect of a running

system, like disk capacity, or memory utilization

– A “service” needn’t be offered externally to a device

• Nagios tests services based on

– max_check_attempts — how many times to check a service

before concluding it is actually down

– normal_check_interval — how many “time units” to wait

between regular service checks

– retry_check_interval – how many “time units” to wait before

checking a service that is not “OK”

• contact_groups — who to complain to in case of a problem

c©2003-2013 John Sellens USENIX LISA 27, 2013 62

Notes:

• See the documentation on “Service Check Scheduling” at

http://nagios.sourceforge.net/docs/2_0/checkscheduling.html

– Nagios works hard to be efficient and effective at doing checks

– Not yet written/updated for Nagios 3.x

Page 63: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Service check_command Directives

• Each service definition must include a check_command

directive

• You can refer to a defined command

– Arguments can be provided, separated by !

• For example

check_command check_ntserv!w3svc

• Open question: where should you quote special characters?

c©2003-2013 John Sellens USENIX LISA 27, 2013 63

Notes:

• You used to be able to provide a “raw” command to be executed, sur-

rounded by double quotes

– But I don’t think you can anymore

– Not often used — you’re better off to use the indirection provided

by command definitions

Page 64: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Notification Configuration

• Host and service definitions also define when notifications

should be sent using the following directives

– notification_interval — number of “time units” (default:

minutes) before re-notifying of a problem

– notification_period — the timeperiod during which

notifications may be sent

– notification_options — hosts: d,u,r,f,s,n; services: w,u,c,r,f,s

– notifications_enabled — 1 for yes, 0 for no

• The contact_group for the hostgroup or service is notified, using

the rules for the individual contacts in group

c©2003-2013 John Sellens USENIX LISA 27, 2013 64

Notes:

• “Time units” are in terms of the “interval_length” defined in the nagios.cfg

file

– Which defaults to 60 seconds

– So the number given in an object definition for an “interval” is usu-

ally the number of minutes

• Hosts: down, unreachable, recovery (up), flapping start/stop, scheduled

downtime start/stop, none

• Services: warning, unknown, critical, recovered, flapping start/stop, sched-

uled downtime start/stop, none

• See the detailed rules in “Notifications”, at

http://nagios.sourceforge.net/docs/3_0/notifications.html

Page 65: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Notes and Links

• You can define

– notes notes_url action_url

for

– host hostgroup service servicegroup

objects

• Show up in reasonable places in the web interface

• Allow you to link to documentation or other things you can do to

the host

c©2003-2013 John Sellens USENIX LISA 27, 2013 65

Notes:

• These are relatively recent - late version 2, or version 3?

• Or perhaps they were always in the hostextinfo and serviceextinfo objects

which I never used?

Page 66: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Dependency Definitions

• The hostdependency and servicedependency definitions allow

you to define relationships between hosts and services

• For example, you could declare that your WWW service

depends on your SQL service

– If SQL is down, don’t bother checking WWW, because we

already know it will fail

• Or your web host may depend on your nfs-server host

– Similar but different from a host’s “parents”

– parents defines network topology

c©2003-2013 John Sellens USENIX LISA 27, 2013 66

Notes:

• Leave dependent_host_name and dependent_hostgroup_name empty

(or null) for “same host”

• See “Host and Service Dependencies” at

http://nagios.sourceforge.net/docs/3_0/dependencies.html

• More on dependencies later

Page 67: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Escalation Definitions

• If something is broken, you may want/need to “escalate” the

problem if it’s not resolved quickly

• Define these if so: hostescalation hostgroupescalation

serviceescalation

• first_notification and last_notification — which notifications to

escalate

• notification_interval — how often to send them

• contact_groups — who to send them to

– Remember to include the “lower level” contactgroups

– Or use additive inheritance (with a + sign)

c©2003-2013 John Sellens USENIX LISA 27, 2013 67

Notes:

• Notifications and escalations provide a very flexible mechanism

• For example, you could set up different contactgroups and contacts for

the same people, but with different ways to contact them, so that you

could

– first email problems reports to the contacts

– then page them

– then send SMS messages to their phones

– then call their home phone numbers

– and so on

• See “Notification Escalations” at

http://nagios.sourceforge.net/docs/2_0/escalations.html

• hostgroupescalation removed in 2.x — you can now use hostgroup_name

in hostescalation definitions

Page 68: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Definition Shortcuts

• General rule: anywhere you can list a host_name or

hostgroup_name you can

– use a comma-separated list of hosts/groups

– exclude with !

– use a wildcard host_name of “∗”, meaning “all hosts”

to have it apply (or not) to multiple hosts

• e.g. A service definition for the HTTP service might include

hostgroup_name webservers

to cause the service to be defined for all hosts in the

webservers hostgroup

• This can save a lot of repetition in your configs

c©2003-2013 John Sellens USENIX LISA 27, 2013 68

Notes:

• e.g. For a service, dependency, or escalation

• Note that this is a “general rule” and won’t necessarily apply in every

possible instance

• In nagios.cfg set use_regexp_matching=1

• See “Time-Saving Tricks For Object Definitions” at

http://nagios.sourceforge.net/docs/3_0/objecttricks.html

Page 69: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

Nagios Macros

• Nagios defines a number of macros for use in commands

– Some implicitly from definition directives, etc.

– Some explicitly, in the resource.cfg file

• These macros can be substituted into host and service check

commands, notifications, event handlers, etc.

– Different macros are available at different times

• An effective way of passing variable data outside of the Nagios

core

c©2003-2013 John Sellens USENIX LISA 27, 2013 69

Notes:

• See “Understanding Macros and How They Work” at

http://nagios.sourceforge.net/docs/3_0/macros.html

http://nagios.sourceforge.net/docs/3_0/macrolist.html

for all the details

• Including a handy reference table of what’s available when, and what all

the macros mean

Page 70: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics More on Configuration

More on Macros

• Lots of macros — all sorts of variant information is available

• Most are added to environment e.g. NAGIOS_SERVICESTATE

– Including any custom variables

• “On-Demand Macros” allow you to refer to values from other

config settings e.g.

$SERVICESTATEID:novellserver:DS Database$

• “On-Demand Group Macros” get you a comma-separated list of

all values in a host, service or contact group e.g.

$HOSTSTATEID:hg1:,$

c©2003-2013 John Sellens USENIX LISA 27, 2013 70

Notes:

• Can disable environment variables by setting the

enable_environment_macrosvariable to 0

– Avoids a bunch of overhead, or so they say

– But it removes a bunch of information that can be useful for plugins

and other commands

– Use the env command with plugins to add what you need

• On-demand macros are not added to the environment

• Environment variables, on-demand macros and more added in version 2

• Even more macros added in version 3

Page 71: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Theory and Practice

c©2003-2013 John Sellens USENIX LISA 27, 2013 71

Page 72: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Theory of Operation

• Documentation contains “Theory of Operation” information

• It covers many of the details of how things actually work

– Status and reachability of network hosts

– Determination of network outages

– Service check scheduling, service state

– Notifications, timeperiods

– Plugins

• You should at least review this information, as it will help you

understand both what is happening, and what is possible

c©2003-2013 John Sellens USENIX LISA 27, 2013 72

Notes:

• Used to be a separate section, now at the end of “The Basics”

• And review the “Advanced Topics” section as well

• And all the rest of the documentation while you’re at it

Page 73: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

State of the Network

• Current state of a service or host: two things

– Status: OK, Warning, Up, Down, etc.

– State: Soft, Hard, Unreachable

• Soft state: a check failed, but still have retries to do

– Logged, and event handler run

• Hard state: When we’re sure

– Notification logic invoked

c©2003-2013 John Sellens USENIX LISA 27, 2013 73

Notes:

• I was tempted to try a pun related to the state of the network address, but

I held back

• State Types: http://nagios.sourceforge.net/docs/3_0/statetypes.html

• That document doesn’t mention “unreachable” as a state, but I think it

likely is an actual state

Page 74: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Stalking and Volatility

• If enabled, stalking logs any changes in plugin output

– Even with no state change

– e.g. RAID check was “1 disk dead” and is now “2 disks dead”

– Logged for later review/analysis

• Volatile services

– Something that resets to OK after each check

– Need attention every time there is a problem

– Notification and event handler happen once per failure

– e.g. Alert on a port scan

c©2003-2013 John Sellens USENIX LISA 27, 2013 74

Notes:

• Most people likely won’t want to use stalking

• Enabled on host and service definitions

• http://nagios.sourceforge.net/docs/3_0/stalking.html

• I’m thinking you could likely get the same result as volatility by setting only

1 check, no recovery notifications

– But I could be wrong

• http://nagios.sourceforge.net/docs/3_0/volatileservices.html

Page 75: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Topology Matters

• Parents directive in host definitions defines topology

• Parents are typically routers, firewalls, switches, etc.

– i.e. How the packets get there from here

• Can have multiple parents with redundant network paths

• Notification_option “u” sends on UNREACHABLE state

• Buzzphrase: root cause analysis

c©2003-2013 John Sellens USENIX LISA 27, 2013 75

Notes:

• Unless your network is tiny, flat or otherwise trivial

• Used in drawing network maps in a sensible way

• Parents in a comma-separated list of all parent hosts

• See the docs: “Determining Status and Reachability of Network Hosts”

http://nagios.sourceforge.net/docs/3_0/networkreachability.html

Page 76: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Depending on Others

• Host and service dependencies define operational requirements

– e.g. Web server can’t work unless file server is working

• execution_failure_criteria and

notification_failure_criteria determine what we do

if something we depend on fails, e.g.

– if file server down, don’t execute web check

– and don’t notify me about web problem

• Set inherits_parent to inherit dependencies in definitions

c©2003-2013 John Sellens USENIX LISA 27, 2013 76

Notes:

• Failure criteria are o (OK), w (warning), u (up), c (critical), p (pending), n

(none)

• I think inherits_parent is perhaps misnamed – parents are topo-

logical, dependencies are different

• http://nagios.sourceforge.net/docs/3_0/dependencies.html

Page 77: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Cached Checks

• Can cache and re-use host or service check results

• Used only for “On-Demand Checks”

– Checking that host is up if a service fails

– Checking topological reachability

– For “predictive dependency checks”

• i.e. Checking for “collateral damage”

• Lower overhead, good results

– You should enable and tune the cache

c©2003-2013 John Sellens USENIX LISA 27, 2013 77

Notes:

• “Predictive dependency checks” – in a network outage, schedule more

topological checks earlier, since we’re likely to need that information

• http://nagios.sourceforge.net/docs/3_0/cachedchecks.html

• Don’t blame me for the “cached checks” pun, because in Canada “cached

cheques” makes no sense at all

Page 78: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Event Handlers

• In a perfect world, nothing would ever go wrong

– In a semi-perfect world, problems would fix themselves

• Event handlers are one of Nagios’ ways of moving closer to

perfection

• An event handler is a command that is run in response to a

state change

– Canonical example: restart httpd if WWW service fails

– Open a trouble ticket on failure?

• Complications: runs as the nagios user, on the nagios server

• Global and specific host and service event handlers

c©2003-2013 John Sellens USENIX LISA 27, 2013 78

Notes:

• And incidentally, in a perfect world, tutorial notes would never contain any

typos

• A state change is (simplistically speaking) a failure or a recovery

– I’m ignoring “hard” vs “soft” states here

• As documented, sudo and ssh can be useful in event handling commands

for elevating permissions and access to remote services

– And recall that SSH key files can allow only specific commands

– I restart a Windows IIS service from Nagios via a script and ssh

• Host and service definitions can use the event_handler directive

• Event handlers can (and should) be passed all sorts of state and check

attempt information

Page 79: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

External Commands

• The Nagios server maintains a named pipe in the file system for

accepting various commands from other processes

• External commands are used most often by the web interface to

record information and modify Nagios’ behaviour

– But you can do lots of things from shell scripts . . .

• Some of the available functionality

– Add/delete host or service comments

– Schedule downtime, enable/disable notifications

– Reschedule host or service checks

– Submit passive service check results

– Restart or stop the Nagios server

c©2003-2013 John Sellens USENIX LISA 27, 2013 79

Notes:

• Written to the named pipe as a single line, with multiple fields

– Syntax is timestamp, command, then ;-separated arguments

• e.g. to get Nagios to restart, write this:

[1041175870] RESTART_PROGRAM;1041175870

• Documented (of course) at

http://nagios.sourceforge.net/docs/3_0/extcommands.html

• Exhaustive list of 157 available commands, with examples, at

http://www.nagios.org/developerinfo/externalcommands/

Page 80: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Passive Service Checks

• Nagios can accept service check results from other programs

– Since Nagios did not initiate the check, these are called

“passive service checks”

• These are useful for

– Asyncronous events (SNMP traps, say)

– Results from other existing programs

– Results from remote or secured systems

• You’ll recall that the NSCA addon uses these

c©2003-2013 John Sellens USENIX LISA 27, 2013 80

Notes:

• Submitted through the external command interface

Page 81: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Distributed Monitoring

• Nagios supports distributed monitoring of a certain style

• Remote Nagios servers are essentially probe engines,

submitting their results to a central server with passive service

check results

• The configuration on the remote servers is a subset of the

central configuration

• The central server is configured to notice if the passive results

stop coming from the remote server

c©2003-2013 John Sellens USENIX LISA 27, 2013 81

Page 82: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Distributed Monitoring (cont’d)

• This seems like a fair amount of duplication of effort to me, but it

gets you all the status on one central console

• I tend to set up independent Nagios servers

– And use my check_nagios_status plugin to “screen-scrape”

the remote web interface

– Providing a central summary and click-through to the remote

server

• Tools like DNX and Mod-Gearman can spread the load

– But still retain one Nagios host doing the scheduling

• Your mileage may vary

c©2003-2013 John Sellens USENIX LISA 27, 2013 82

Notes:

• More on DNX on page 139

• More on Mod-Gearman on page 140

• My simple-minded mbdivert (page 141) distributes some checks

• The “central aggregation” approach is used by a number of more recent

tools, such as Nagios Fusion (page 30), Thruk (page 148), MNTOS (page

148), and Multisite (page 148)

Page 83: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Adaptive Monitoring

• Can change things during runtime via external commands

– e.g. schedule changes, or from an exception handler

• Can change

– Check commands and arguments

– Check interval, max attempts, timeperiod

– Event handler commands and arguments

• Likely just for very specific uses or situations

c©2003-2013 John Sellens USENIX LISA 27, 2013 83

Notes:

• An previous audience member gave an example of an active/passive

cluster that is behind a hardware load balancer

– Checking the hosts directly

– Need to move the checks if the service fails over

– No service address that moves between the hosts

• Added in version 2

• http://nagios.sourceforge.net/docs/3_0/adaptive.html

Page 84: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

Obsession

• OCSP: Obsessive Compulsive Service Processor

• OCHP: Obsessive Compulsive Host Processor

• Commands that may be executed after every service or host

check

• Allows you to pass results to external applications

– e.g. Used to submit distributed monitoring results with

send_nsca

• Efficient? Commonly Used? Scalable?

c©2003-2013 John Sellens USENIX LISA 27, 2013 84

Page 85: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Theory and Practice

NEB – Nagios Event Broker

• Allows you to add code to the Nagios core

• Dynamically loaded module

• Has access to internal Nagios events and data

• Limited documentation — helloworld.c in source

• Starting to be used for interesting things

– Logging to database

– DNX – check distribution

c©2003-2013 John Sellens USENIX LISA 27, 2013 85

Notes:

• USENIX ;login: articles on NEB Modules by David Josephsen in October

and December 2008

http://www.usenix.org/publications/login/2008-10/index.html

http://www.usenix.org/publications/login/2008-12/index.html

Page 86: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Getting Larger

Getting Larger

c©2003-2013 John Sellens USENIX LISA 27, 2013 86

Page 87: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Getting Larger

Scaling Up

• Nagios can handle a lot without much effort

• As you get larger, advanced features are more important

– Use parent/child and host/service dependencies

– More efficient for humans and machines

• You will need to be more rigorous in your configuration

– Consistency, completeness, tuning

• Version 3 adds scalability and tuning features

• Version 4 adds even more

c©2003-2013 John Sellens USENIX LISA 27, 2013 87

Page 88: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Getting Larger

Check Execution

• A check is typically fork(), fork(), exec()

• Theory is that running the plugins uses lots of resources

• Distribute plugin execution for more capacity

– Distributed monitoring, multiple servers, DNX, etc.

• Perhaps embedded perl is a practical tool?

– Pros and cons – not all Perl will embed nicely

– Force/avoid ePN: # nagios: +epn

– In first 10 lines of Perl script . . .

– No longer available in version 4

c©2003-2013 John Sellens USENIX LISA 27, 2013 88

Notes:

• ePN is “embedded Perl Nagios”

• Explicit - if first 10 lines of a script contain

– # nagios: +epn

– # nagios: -epn

• Implicit use of ePN via configuration options

• Embedded perl overview and pros and cons at

http://nagios.sourceforge.net/docs/3_0/embeddedperl.html

Page 89: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Getting Larger

More on Timeperiods

• Timeperiods can be quite specific

• Use and exclude of timeperiods are very flexible

– e.g. define “holidays” and exclude from “workhours”

• Can describe on-call schedules, maintenance windows, etc.

• Avoid check overhead when you don’t care

• Not quite as useful for the one-person shop . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 89

Notes:

• All the cool timeperiod stuff was added in version 3

• “Time Periods”

http://nagios.sourceforge.net/docs/3_0/timeperiods.html

• “On-Call Rotations”

http://nagios.sourceforge.net/docs/3_0/oncallrotation.html

Page 90: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Getting Larger

Large Installation Tweaks

• use_large_installation_tweaks option

• No summary macros in the environment to avoid overhead

– e.g. TOTALHOSTSUP, etc.

• Lazy, but more efficient memory freeing in children

• Checks are single, not double, fork()

c©2003-2013 John Sellens USENIX LISA 27, 2013 90

Notes:

• “Large Installation Tweaks”

http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html

Page 91: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Getting Larger

Tuning for Performance

• Lots of tunable configuration parameters

• Keep performance graphs of Nagios

– MRTG, nagiostats, etc.

• Disable environment macros

– Use the env command with plugins to add what you need

• Use passive checks if you can

– Not my favorite idea . . .

• Avoid interpreted plugins, or offload checks

• Use Fast Startup Options – pre-cache configs

c©2003-2013 John Sellens USENIX LISA 27, 2013 91

Notes:

• Lots of good information in the documentation

• “Tuning Nagios For Maximum Performance”

http://nagios.sourceforge.net/docs/3_0/tuning.html

• “Graphing Performance Info With MRTG”

http://nagios.sourceforge.net/docs/3_0/mrtggraphs.html

• “Fast Startup Options”

http://nagios.sourceforge.net/docs/3_0/faststatup.html

Page 92: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Tips and Tricks

c©2003-2013 John Sellens USENIX LISA 27, 2013 92

Page 93: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Tips and Tricks

• Use the parent/child topology

– Pre Nagios 3, host checks are not parallelized

– Host checks of a down segment can block all other checks

• Be consistent and use templates and groups

– Make it easy to add another similar host

– Make it easy to add a service to a group of hosts

• Smarter plugins make life (configuration) easier

c©2003-2013 John Sellens USENIX LISA 27, 2013 93

Page 94: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Hostgroups Are Your Friends

• Hostgroups are really handy for grouping checks

• Use the +groupname syntax e.g.hostgroups +webservers,dbserverswhich makes it easy to add on checks as you include templates

• With multiple Nagios servers useallow_empty_hostgroup_assignment=1

– You can define machine types as common hostgroups

– Even if you don’t have every type on every Nagios server

c©2003-2013 John Sellens USENIX LISA 27, 2013 94

Page 95: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Organize Your Config Files

• Put files in different directories

• One host per config file

• Generate configs from other information you already have

– Or use a script to generate from a list

• Take advantage of your naming convention

– Wildcards in host names based on FQDNs

c©2003-2013 John Sellens USENIX LISA 27, 2013 95

Page 96: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Simpler Configs

• Simple commands in Nagios configs

• Same config for every machine

• Set limits outside of Nagios configs

– Manually or automatically

– Move details/smarts outside the configs

c©2003-2013 John Sellens USENIX LISA 27, 2013 96

Notes:

• Well, maybe not every machine, but try to avoid per-machine configs

Page 97: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Separate Your Problem Space

• Multiple locations? Use multiple servers!

– Distributed monitoring

– Or separate systems, aggregated or summarized centrally

• Can you delegate to different internal groups?

– One system for networks, one for servers, . . .

– Scales software, and your time

c©2003-2013 John Sellens USENIX LISA 27, 2013 97

Page 98: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Another Level of Indirection and “Smarts”

• Wrap your plugins in smarter scripts

– How about a master checker that knows all and checks

everything on a host?

• Have your plugins determine what’s “normal” or not

– So you don’t have to pre-set thresholds, etc.

– Time of day, trends, past experience

– Based on other current state/activity

• Use inherited defaults in configs, override as needed

• Let the machines make config changes, instead of you

c©2003-2013 John Sellens USENIX LISA 27, 2013 98

Notes:

• There is mathematics that will tell you if something is unusual

– Cricket can use Holt-Winters Forecasting for aberrent behaviour

detection

– Undoubtedly other techniques (standard deviation perhaps?)(

– I have forgotten everything I once knew about math

Page 99: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Custom Object Variables for Limits

• Define a custom variable, use it in a check command

• In a global host template, set defaults

• Use other templates to set defaults for locations, hostgroups

• Set per-host values in host definition

Host template or host:_LOADWARN 5,3,2_LOADCRIT 7,5,4

Command definition:command_line $USER1$/check_load

--warning=$_HOSTLOADWARN$--critical=$_HOSTLOADCRIT$

c©2003-2013 John Sellens USENIX LISA 27, 2013 99

Notes:

• Useful with multiple inheritence:

define host {name busymachines_LOADWARN 10,6,4_LOADCRIT 15,10,6register 0}define host {use busymachines,generic...}

• Silly me, I only twigged to this after using Nagios 3.x for a long time, when

I was trying to solve a particular problem

Page 100: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Custom Wraps

• Multi-stage plugin checks with a shell script

• Check multiple things

– Is at least one interface up?

– Is at least one redundant server up?

• Expect scripts for interactions

• Web form posting tools

c©2003-2013 John Sellens USENIX LISA 27, 2013 100

Page 101: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Custom Wraps (cont’d)

• “Pervasive wrapper” — redefine $USER1$$USER1$=/usr/local/mywrapper/usr/local/libexec/nagios

• Custom object variables in the environment_web_regexp SomeRegExpNAGIOS__HOSTWEB_REGEXP

• Environment macros mean your plugins can know everything

c©2003-2013 John Sellens USENIX LISA 27, 2013 101

Notes:

• i.e. You can make your configs suddenly very different

• enable_environment_macros=1

Page 102: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Tips and Tricks

Dynamic Behaviour Changes

• Want to disable load and disk I/O checks during backup

– But backup takes a variable amount of time

• checkbook sets various tag files and runs a commandcheckbook -t load -e run-backup– Can set time limits, and recommended plugin result code

– Remove tag files when command finishes

• checkmate is a plugin wrapper that observes the tag files and

manipulates the plugin return codecheckmate -t load -- \check_load -w 1,2,3 -c 3,4,5

c©2003-2013 John Sellens USENIX LISA 27, 2013 102

Notes:

• Influenced by LISA 2000 paper on “eEMU enterprise Event Management

Utility” which provided ways for monitoring clients to advise the server

– “eEMU: A Practical Tool and Language for System Monitoring and

Event Management”, Jarra Voleynik, eEMUconcept Pty Ltd, LISA

2000,

http://www.usenix.org/event/lisa2000/voleynik.html

• Will be available at

http://www.syonex.com/resources/software/

Page 103: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Abusing Nagios

c©2003-2013 John Sellens USENIX LISA 27, 2013 103

Page 104: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Contact Convolution

• Note that people to contacts need not be one to one

• Sometimes you want to be paged, sometimes mailed,

sometimes not

• Consider three contact groups:

– sysadmin, sysadmin-email, sysadmin-page

• Contactgroup directives include contactgroup_members

• Define generic contact templates with notification commands

• Define per-person contact templaces with details

• Define 3 contacts for each, use-ing the templates, in the

contactgroups

c©2003-2013 John Sellens USENIX LISA 27, 2013 104

Notes:

• If this isn’t clear, please let me know, and I’ll send a sample file

Page 105: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

What to Say: genoa

• GEneric NOtification Author/Arranger/Artist

• Uses template files and environment variables to format

notifications

• Choose a template based on hostname, problem type,

notification type, custom object variables

c©2003-2013 John Sellens USENIX LISA 27, 2013 105

Notes:

• Via Nagios Exchange

• Or http://www.syonex.com/resources/software/

Page 106: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

genoa in Configs

define command {

command_name host-by-email

command_line genoa -s

}

define command {

command_name host-by-pager

command_line genoa -p -t

}

c©2003-2013 John Sellens USENIX LISA 27, 2013 106

Notes:

• Isn’t that simpler than the typical printf piped into mail?

• Uses all the environment variables set by

enable_environment_macros=1

• Could also of course use the env command to provide more environment

variables

Page 107: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Sample genoa TemplateSubject: $NOTIFICATIONTYPE$ - $HOSTNAME$/$SERVICEDESC$is $SERVICESTATE$ - alert $NOTIFICATIONNUMBER$To: $CONTACTEMAIL$

$NOTIFICATIONTYPE$: $SERVICEOUTPUT$

Service: $SERVICEDESC$

Host: $HOSTNAME$ / $HOSTALIAS$State: $SERVICESTATE$ for $SERVICEDURATION$Address: $HOSTADDRESS$

Date/Time: $SHORTDATETIME$

genoa template $GENOATEMPLATE$

c©2003-2013 John Sellens USENIX LISA 27, 2013 107

Notes:

• Rules for template searching based on

– HOSTNAME or HOSTADDRESS

– _GENOACLASS custom object variable

– For HOST or SERVICE problem

– NOTIFICATIONTYPE - problem, acknowledgement, etc

– Falls back on a default file

• Nagios sometimes sets variables that I didn’t expect, so trying to deter-

mine whether the notification is for a HOST problem or a SERVICE prob-

lem is more convoluted than I had expected.

Page 108: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Sample genoa Pager Templates

tdir/pager/SERVICE_PROBLEM

XX $HOSTNAME$ $SERVICEDESC$ $SERVICESTATE$$SERVICEOUTPUT$ for $SERVICEDURATION$$SHORTDATETIME$

tdir/pager/SERVICE_ACKNOWLEDGEMENT

ACK $HOSTNAME$ $SERVICEDESC$ $SERVICEACKAUTHOR$: $SERVICEACKCOMMENT$ : $SERVICESTATE$$SERVICEOUTPUT$ for $SERVICEDURATION$$SHORTDATETIME$

c©2003-2013 John Sellens USENIX LISA 27, 2013 108

Notes:

• I was surprised how expressive I could be with just the standard environ-

ment variables

• But one could imagine running templates through, say, m4

Page 109: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Give Me a Call: tellitto

• Implements multiple notification methods

• Keeps trying until one succeeds

– e.g. Try SMS service, modem, mail, etc.

• Puts phone/pager numbers in one place

• Pipe message into tellitto, or use genoa -t

c©2003-2013 John Sellens USENIX LISA 27, 2013 109

Notes:

• Via Nagios Exchange

• Or http://www.syonex.com/resources/software/

Page 110: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Sample tellitto Config

# Don’t put secrets in here# You can use different orders per contact# for better deliverability

bob pageuser bobbob smsgateway 12125551234bob clickatellgate 12125551234bob mail -s tellitto-notification

[email protected]

sally clickatell 12125557890sally smsgateway 12125557890sally pageuser sallysally mail -s tellitto-notification

[email protected]

c©2003-2013 John Sellens USENIX LISA 27, 2013 110

Notes:

• We have a command smsgateway which uses curl to send to our

provider

• Ditto for clickatellgate

• We have a command pageuser which tries to use Hylfax’s sendpageto send via a modem

Page 111: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Cheap (Check) Tricks

• Check for an open localhost TCP port (e.g. 3366)

tcp.tcpConnTable.tcpConnEntry.tcpConnState

.127.0.0.1.3366.0.0.0.0.0

= listen(2)

– Which may not work on Windows . . .

• I put together a check_allstorage plugin

– Don’t need to set limits in nagios config

– Gets list of filesystems from device, cache in /tmp dir

– Estimates thresholds based on current usage

• And check_netapp_df for NetApp volumes

c©2003-2013 John Sellens USENIX LISA 27, 2013 111

Notes:

• I needed to check that the PureMessage Milters were actually running

and listening locally on remote mail servers

• I find check_allstorage very handy, we used nagmin, which I found a little

cumbersome

– I can tweak the automatic thresholds by editing the /tmp file

• http://www.syonex.com/resources/

Page 112: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Getting There from Here

• I try to avoid the standard tools to run remote plugins

– Like NRPE, check_by_ssh, etc.

• My check_snmpexec does an SNMP query to run plugin

remotely

– Net-SNMP snmpd exec functionality

• I check Windows services with check_winsvc

– Uses snmptable to get

enterprises.lanmanager

.lanmgr-2.server.svSvcTable

– And looks for the desired services in the output

c©2003-2013 John Sellens USENIX LISA 27, 2013 112

Notes:

• Fits nicely with my SNMP religion

• And another level of indirection

• And avoids having more ports and services on your network

Page 113: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Windows Service Checks via SNMP

• I check Windows services with check_winsvc

– Uses snmptable to get

enterprises.lanmanager

.lanmgr-2.server.svSvcTable

– And looks for the desired services in the output

• check_winservices uses files to know what should be running

– Calls check_winsvc with per-host services

– Initializes per-host lists if missing

c©2003-2013 John Sellens USENIX LISA 27, 2013 113

Notes:

• check_winservices has global FORCE and IGNORE files, used to initial-

ize per-host lists

• Currently no way to check that a service is not running

– My most common problem is services that stop, rather than ser-

vices that show up

• I’m not very Windows inclined – there are Windows addons and WMI

tools

• But SNMP seems to get me most of what I currently need

Page 114: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Web Server Abuse

• There’s lots of different transports

• Got a visible web server that can run PHP or CGI?

• Set up a “hidden” web page to run your check

– Use Auth or allow/deny rules to limit access

– Use check_http to look for a regular expression

– Get remote status over port 80

c©2003-2013 John Sellens USENIX LISA 27, 2013 114

Notes:

• We do a few variants on this to get status and state out of our public web

servers

Page 115: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Put Yourself in Someone Else’s Shoes

• Sometimes you don’t “own” a remote network

– Central networking group

– Service provider

• My theory: a tiny utility server solves many problems

– NRPE, SSH, plugins

• My idea: MonBOX Remote Monitoring Appliance

– Consider this a gratuitous plug

c©2003-2013 John Sellens USENIX LISA 27, 2013 115

Notes:

• Commercial product

• http://www.monbox.com/

• You can of course do something similar yourself

• mbdivert wrapper makes it easy to send some checks to a remote server

Page 116: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Ghost Hosts

• We had a bunch of SMTP servers, and seven locations

• We had DNS names like smtp.location.company.com

– Which are DNS A records with multiple addresses

– So they can be sendmail “smart hosts”

• How do we know that the smtp names still work?

• Define a “virtual” host called smtp-servers with address

127.0.0.1

• And a bunch of check_smtp service checks for the various

names

c©2003-2013 John Sellens USENIX LISA 27, 2013 116

Notes:

• At former employer

Page 117: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Service as a Host

• We had an outsourced help desk

– They watched nagios, but only cared about “down” hosts

• How did we get them to notice a down link between up routers?

• Made up a hostname “link-tor-det” and the host check is

check_hops

• So the link down looks like a host down

c©2003-2013 John Sellens USENIX LISA 27, 2013 117

Page 118: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Nonsense with Negate

• The negate plugin inverts the result of a plugin

• No webserver (or telnet, or . . . ) allowed:

negate check_http -H hostname

• File doesn’t exist:

negate check_file_age -H hostname

• Another level of indirection . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 118

Page 119: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Hey! Wake Up!

• At FreshBooks, we have “hack offs”

• I have a remote control power bar

• I bought at 12 volt revolving light at the auto supply

• A few scripts watch the status file

– Look for unhandled problems

• Can be used with a klaxon horn as well . . .

• I now have this hooked into my Asterisk box at home . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 119

Page 120: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Abusing Nagios

Remember . . .

• Anything you care about can be monitored

• It does not need to be a “service”

• Or even something on a computer or network device

• Simple shell plugins are powerful

c©2003-2013 John Sellens USENIX LISA 27, 2013 120

Page 121: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Plugin Pointers

Plugin Pointers

c©2003-2013 John Sellens USENIX LISA 27, 2013 121

Page 122: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Plugin Pointers

Hardware Happiness

• A variety of ways to get hardware state

– Fans, temperature, power, etc.

• IPMI and ipmitool(1) from OpenIPMI-tools

– And a check_ipmi plugin, or roll your own

– e.g. ipmitool sdr or ipmitool sensor

• Various hardware vendors have specific tools

– e.g. Dell’s OpenManage and the omreport command

• Remember you can wrap anything and make a plugin . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 122

Notes:

• OpenIPMI-tools rpm in Centos, etc.

• http://ipmitool.sourceforge.net

• http://openipmi.sourceforge.net

• http://www.qwirx.com/check_ipmi

Page 123: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Nagios Addons

c©2003-2013 John Sellens USENIX LISA 27, 2013 123

Page 124: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Visualizing Nagios Data

• Core Nagios has some visualization built-in

– History graphs, availability, etc.

• There are many addons for more graphing detail

– Performance data graphing

• I’m not sure there are clear winners

– Yet

c©2003-2013 John Sellens USENIX LISA 27, 2013 124

Notes:

• Many people subscribe to the theory that a picture is worth n words,

where n is a positive integer, usually of several orders of magnitude

Page 125: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Nagios and RRDtool

• Nagios doesn’t have RRDtool support built-in

• The Apan project aims to make it simple

• Provides a wrapper to use around Nagios plugins

• Uses the plugin’s output as RRD data

• Could use a little more polish (especially in the docs), but looks

interesting

• http://apan.sourceforge.net/

c©2003-2013 John Sellens USENIX LISA 27, 2013 125

Notes:

• Apan version 0.3.0-sql released December 16, 2003

• Configuration is now in an SQL database, rather than text files

– Which I suppose could be a blessing or a curse

• My unproven theory is that the perfdata output from the plugins would be

a good way to pass data to RRDtool

• But I could be completely wrong!

Page 126: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

More Graphing: nagiostat and nagiosgraph

• Parses performance data

• Stores in RRDtool

• Graphs on the fly through CGI

• Called via the “service_perfdata_command” setting for each

plugin result

• i.e. Extra fork/exec for every plugin

• nagiostat and nagiosgraph appear very similar

• Nagios Grapher looks interesting, and active

c©2003-2013 John Sellens USENIX LISA 27, 2013 126

Notes:

• Not to be confused with “nagiostats” from Nagios 2.x which tells you about

the Nagios process itself

• http://nagiostat.sourceforge.net/

• http://nagiosgraph.sourceforge.net/

• http://www.nagiosexchange.org/NagiosGrapher.84.0.html or

http://sourceforge.net/projects/nagiosgrapher/

Page 127: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

More Graphing: PerfParse

• Similar in purpose to Nagiostat

• But can get performance data in a variety of ways

– Including periodically reading perfdata log files

• Inserts data into MySQL

• Graphs through a CGI that integrates into the Nagios interface

• I think this is likely the best approach of the three

c©2003-2013 John Sellens USENIX LISA 27, 2013 127

Notes:

• http://perfparse.sourceforge.net/

• Version 0.106.1 released April 11, 2006

– Though the web site references are lagging a bit

Page 128: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Even More Graphing: PNP

• PNP (PNP is not PerfParse)

• Snazzy graph viewing features

– AJAX, zoom, calendar

– Show all graphs for a host

• Templates for different types of data/checks

• Bells and whistles!

c©2003-2013 John Sellens USENIX LISA 27, 2013 128

Notes:

• Seems to have some “traction”

• I’m not clear on the history of the name

– Is this a fork?

• http://pnp4nagios.sourceforge.net/

• Version 0.4.9, May 15, 2008

Page 129: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Still More Graphing: n2rrd

• Nagios to RRD

• Uses “perfdata command” hooks

• A little more easily configurable

• Active development

• Can interface with Cacti and Drraw

• Worth a look

c©2003-2013 John Sellens USENIX LISA 27, 2013 129

Notes:

• http://n2rrd.diglinks.com

• n2cacti derivative: http://sourceforge.net/projects/nagios2cacti

Page 130: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NagVis — Visualization

• Display IT process like a mail system or a network infrastructure

• Sort of like an interactive host/service “map”

• Could have an overview showing state, and “drill down”

• Active development

• Uses NDOUtils

c©2003-2013 John Sellens USENIX LISA 27, 2013 130

Notes:

• Haven’t tried it yet, but looks really neat

• http://nagvis.org/

Page 131: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Nagios Addons

• The “downloads” section of the Nagios web site includes

references to a number of “addons” for Nagios

– Providing interesting additional functionality

• The extras are now listed on nagiosexchange.org

• Have a browse, and you’ll get some interesting ideas

• A few addons, in particular, are worth special mention

c©2003-2013 John Sellens USENIX LISA 27, 2013 131

Notes:

• See my “Tools You Need” notes for more addon mentions

• http://www.nagios.org/download/addons/

• http://exchange.nagios.org/

Page 132: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NRPE and NSCA

• These addons were written to address the need to do remote

service checks of various types

• NRPE – Nagios Remote Plugin Executor

– A client “check_nrpe” and server “nrpe” pair

– Lets a Nagios server connect to a remote daemon, which will

run plugins on the remote machine, and return the results

– Moderate security features

• For Windows: try nrpe_nt

– A re-implementation for Windows servers

c©2003-2013 John Sellens USENIX LISA 27, 2013 132

Notes:

• Both NRPE and NSCA were written by Ethan Galstad (the Nagios au-

thor), so you can sort of consider them as just outside the Nagios core

• Despite my religion, I will sometimes admit that not all problems can be

solved with SNMP

– But don’t attempt to quote me on that!

• Was going to be split out at

http://sourceforge.net/projects/nrpe

• nrpe_nt for Windows at

http://www.miwi-dv.com/nrpent/

• Plugins for Windows NRPE: search for “Windows NRPE” at

http://exchange.nagios.org/

Page 133: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NRPE and NSCA Illustrated

c©2003-2013 John Sellens USENIX LISA 27, 2013 133

Notes:

• I stole these from

http://www.nagios.org/images/addons/nrpe/nrpe.png

and

http://www.nagios.org/images/addons/nsca/nsca.png

Page 134: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NRPE and NSCA (cont’d)

• NRPE uses popen() which means a shell is involved on the

remote host

– Leads to problems with special characters and quoting

• NSCA – Nagios Service Check Acceptor

– A client “send_nsca” and server “nsca” pair

– Lets a remote system run a local check and submit a

“passive service result” to the Nagios server

– Can also be used to set up distributed monitoring, with

service results aggregated on a central server

c©2003-2013 John Sellens USENIX LISA 27, 2013 134

Notes:

• My slightly modified NRPE avoids some problems and tries to make some

things easier

• http://www.syonex.com/resources/software.html

Page 135: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NRDP — Nagios Remote Data Processor

• Designed to replace NSCA

• Standard ports and protocols — HTTP(S) and XML

• Can be used for other purposes as well

– Send data, status or commands to a server

– Let server do something with it

c©2003-2013 John Sellens USENIX LISA 27, 2013 135

Notes:

• http://exchange.nagios.org/directory/Addons/Passive-Checks/NRDP–2D-

Nagios-Remote-Data-Processor/details

Page 136: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NDOUtils — Status and Events to DB

• Nagios Data Output Utilities

• Event broker and intermediate programs to stuff status and

event data into a database

– So you can crunch the details later

– Into pretty reports and graphs

• Real time (via pipe) or periodic (via files)

• Can import old logs

• Still alpha/beta, and still “experimental”

• “likely play a central role in the new Nagios web interface”

c©2003-2013 John Sellens USENIX LISA 27, 2013 136

Notes:

• Also by Ethan Galstad, so just outside the nagios code

• Currently MySQL only, but PostgreSQL will likely be added at some point

– Or so the README says

– I think the icinga equivalent may have PostgreSQL support now

Page 137: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NDOUtils — Simple Diagram

c©2003-2013 John Sellens USENIX LISA 27, 2013 137

Notes:

• Stolen from http://www.nagios.org/images/addons/ndoutils/ndoutils.png

Page 138: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NDOUtils – building

• On FreeBSD, I had to do:% env CPPFLAGS=-I/usr/local/include% LDFLAGS=-L/usr/local/lib./configure--with-mysql-inc=/usr/local/include--with-mysql-lib=/usr/local/lib/mysql

• If I added --with-pgsql-lib it failed to find mysql

• But it wouldn’t compile . . .

– No -I on gcc commands

c©2003-2013 John Sellens USENIX LISA 27, 2013 138

Page 139: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

DNX – Distributed Nagios Executor

• NEB module, intercepts check commands before execution

• Passes work off to a “worker node”

• Bypasses external command pipe/file

• Workers register with the server

– Can come and go as they please (or die)

• Assumption: no worker is preferred

– i.e. doesn’t address local vs remote

c©2003-2013 John Sellens USENIX LISA 27, 2013 139

Notes:

• http://dnx.sourceforge.net/

• From the LDS Church

– They do > 10, 000 checks every 5 minutes

– And expect “sharp increases” in requirements

Page 140: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Mod-Gearman – Distributed Nagios Workers

• Event broker module, Gearman master daemon

• Worker nodes register with master

– Run checks

– Can select hostgroups to work on – remote locations

– Distributed, load balancing

• Likely the current preferred approach

c©2003-2013 John Sellens USENIX LISA 27, 2013 140

Notes:

• http://mod-gearman.org/

• From Sven Nierlein, author of Thruk

Page 141: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

mbdivert - Divert Checks Elsewhere

• mbdivert – plugin wrapper

• Sends plugin checks elsewhere based on hostname/IP/regex

• Uses check_nrpe or check_by_ssh or . . .

• “Seamless” intergration into existing Nagios configs

• Makes use of a slightly enhanced NRPE

• Geographic, administrative or per-network diversion/distribution

• Bypass firewalls restrictions – divert SNMP over SSH

c©2003-2013 John Sellens USENIX LISA 27, 2013 141

Notes:

• This is one of my little projects

• http://www.syonex.com/resources/software/

• Built to work with the MonBOX Remote Monitoring Appliance

– But is useful on its own

– e.g. We use it at FreshBooks to run some checks locally on remote

isolated machines

Page 142: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Nagios Business Process Addons

• “Roll Up” host/service checks to represent “business processes”

• Adds new entries in web menu

• Sort of a process summary view

• Seems like a useful abstraction

• http://nagiosbp.sourceforge.net/

c©2003-2013 John Sellens USENIX LISA 27, 2013 142

Page 143: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Merlin or Module for Effortless Redundancy and

Loadbalancing In Nagios

• Effort to make distributed Nagios easy

– Alternative to NSCA

• Load balancing, redundancy, distributed

• Status database

• Active development, part of op5 products?

c©2003-2013 John Sellens USENIX LISA 27, 2013 143

Notes:

• From the good folks at op5.com, who have Nagios-based products.

• Requires Nagios 4

– Or so the README in the git repository says

• http://www.op5.org/community/plugin-inventory/op5-projects/merlin

Page 144: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

V-Shell — Alternative Front End for Nagios

• Standard web interface is thought to be “long in the tooth”

• Alternative web interface to Nagios Core

– From the Nagios Enterprises team

• PHP, CSS, valid XHTML

• Basic, functional rather than web 2.0 whizbang

• Work in progress

c©2003-2013 John Sellens USENIX LISA 27, 2013 144

Notes:

• V-Shell released in early October 2010

• List of notable themes and web interfaces at

http://www.nagios.org/download/frontends

• Nagios V-Shell is referenced there and on

http://exchange.nagios.org

Page 145: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Adagios – Web Based Nagios Configuration

• Web configuration interface

• No database backend

– Stores configs in standard files

– Uses pynag Python library http://pynag.org/

• Experimental status view

• Seems like a convenient approach

c©2003-2013 John Sellens USENIX LISA 27, 2013 145

Notes:

• http://adagios.org/

Page 146: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

NINJA or Nagios Is Now Just Awesome

• Attempt to develop an alternative Nagios GUI

• PHP, scalability, better searching and filtering

• Multi-language, templates/skins

• Relies on Merlin, Nagios 4, livestatus

• Used in op5’s monitoring product

c©2003-2013 John Sellens USENIX LISA 27, 2013 146

Notes:

• Also from the good folks at op5.com.

• http://www.op5.org/community/plugin-inventory/op5-projects/ninja

Page 147: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Naglite2 — Simple Status Screen

• At FreshBooks we use Naglite2 for a status screen

• Quick summary of currently un-ack’d problems

• Handy for NOC status wall screens

• By Laurie Denness, at Last.fm

• http://laurie.denness.net/blog/2010/03/naglite2-finally-released/

• http://github.com/lozzd/Naglite2

c©2003-2013 John Sellens USENIX LISA 27, 2013 147

Page 148: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Multiple Nagios Aggregators

• MNTOS — Multi Nagios Tactical Overview System

– Simple summary of multiple Tactical Overview pages

• Thruk Monitoring Webinterface

– Terrific – Aggregates multiple backends

– Uses MKLivestatus for communication and commands

• Multisite — aggregator, multiple views

• Check_MK — all-in-one, check everything plugin

c©2003-2013 John Sellens USENIX LISA 27, 2013 148

Notes:

• MNTOS from http://www.sorkmos.com/index.php?page=mntos

• http://www.thruk.org/index.php

• Thruk supports Nagios and “similar” tools

– From Sven Nierlein, author of Mod-Gearman

• MKLivestatus, Check_MK, Multisite from Mathias Kettner

• MKLivestatus NEB module from http://mathias-kettner.de/checkmk_livestatus.html

• Multisite from http://mathias-kettner.de/checkmk_multisite.html

• Check_MK from http://mathias-kettner.com/check_mk.html

Page 149: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Nagios Addons

Nagios-based Distributions

• FAN – Fully Automated Nagios

– CentOS, Nagios, Centreon, NagVis

– Full distribution, or addon RPMs

• OMD – The Open Monitoring Distribution

– Icinga, Shinken, NagVis, Thruk, etc.

– Multiple instances

– Install on Ubunto, SLES, RedHat, etc.

c©2003-2013 John Sellens USENIX LISA 27, 2013 149

Notes:

• http://www.fullyautomatednagios.org/

• http://omdistro.org/

Page 150: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Wrap Up

Wrap Up

c©2003-2013 John Sellens USENIX LISA 27, 2013 150

Page 151: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Wrap Up

Summary

• We’ve tried to hit the key areas for more advanced Nagios use

• We didn’t cover everything

– The community is large

– The extensions are extensive

• Hopefully you’ve learned some of the more interesting aspects

• And can apply them in your own implementations

c©2003-2013 John Sellens USENIX LISA 27, 2013 151

Page 152: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Wrap Up

Where to Get Nagios Help

• The Nagios web site contains a lot of good information

• The documentation is very good

• A growing FAQ collection

• Links to addons and so on

• Mailing lists for both Nagios and the plugins: announce, users,

developers, help, checkins

• network.nagios.org — master list of Nagios sites

– community.nagios.org – news and updates

– exchange.nagios.org – lots of plugins and addons

– etc. . . .

c©2003-2013 John Sellens USENIX LISA 27, 2013 152

Notes:

• http://www.nagios.org/ of course

• community.nagios.org is “Where the Community Connects”

Page 153: S6 — Nagios: Advanced Topics or Non-Obvious Nagios · S6 — Nagios: Advanced Topics Preamble and Introduction Overview • Nagios is well-established and widely used – Over a

S6 — Nagios: Advanced Topics Wrap Up

And Finally!

• Please take the time to fill out the tutorial evaluations

– The tutorial evaluations help USENIX offer the best possible

tutorial programs

– Comments, suggestions, criticisms gratefully accepted

– All evaluations are carefully reviewed, by USENIX and by the

presenter (me!)

• Feel free to contact me directly if you have any unanswered

questions, either now, or later: [email protected]

• Questions? Comments?

• Thank you for attending!

c©2003-2013 John Sellens USENIX LISA 27, 2013 153

Notes:

• Thank you very much for taking this tutorial, and I hope that it was (and

will be) informative and useful for you.

• I would be very interested in your feedback, positive or negative, and sug-

gestions for additional things to include in future versions of this tutorial,

on the comment form, here at the conference, or later by email.