Upload
doanngoc
View
231
Download
0
Embed Size (px)
Citation preview
GSR - TP3
Monitorização com NAGIOS e MRTG uti l izando SNMP e RMON
José Neto - 2001113213
Flávio Guilherme - 2007183058
DEI-FCTUC [email protected] - [email protected]
DEI-FCTUC
Configuração das Máquinas
Para a realização deste trabalho utilizámos três maquinas virtuais. Duas a correr o sistema operativo Fedora 13 e a correr
Windows XP.
Configurou-se um dos servidores Linux como estação de monitorização de alguns serviços equipamentos do Laboratório
G5.1 e Rede do DEI. Neste servidor começou-se por configurar o MRTG, e de seguida o NAGIOS.
Na realização deste trabalho foram também configurados um servidor Windows e um servidor Linux (Fedora 13) com o
objectivo de suportar SNMP. No cenário pedido foram também utilizados o Switch 3Com do laboratório e o gateway de
saída do DEI, que já suportam SNMP.
Por fim utilizou-se um serviço do DEI de envio de mensagens SMS a partir de emails.
Ferramentas de Monitorização: MRTG
A activação do MRTG foi executada exactamente como refere o manual da cadeira.
Após a execução destes comandos foi necessário instalar o SNMP no router.
Yum –y install net-snmp net-snmp-utils
Para activar o serviço SNMP no router gsrtp3.dei.uc.pt, utilizou-se o seguinte comando.
Service snmpd start
De seguida foi necessário criar uma pagina HTML de rosto que agrega os gráficos produzidos pelo MRTG para as várias
interfaces do router. Construção de um ficheiro de configuração para o MRTG.
Usr/local/mrtg-2/Bin/cfgmaker –global ‘WorkDir: /var/www/html/mrtg’ –global ‘Options[_]:bits,growright’ –output=/etc/mrtg.cfg [email protected]
Construção da uma pagina de rosto que agrega os gráficos produzidos pelo MRTG
/usr/local/mrtg-2/Bin/indexmaker –output=/var/www/html/mrtg/índex.html /etc/mrtg.cfg
Era pedido que o acesso às páginas do mrtg pedisse login e password, por isso tivemos de recorrer a sua configuração.
Para tal procedeu-se aos seguintes comandos.
DEI-FCTUC
Nagios 1
Make isntall-webconf /usr/Bin/isntall –c –m 644 sample-config/httpd.conf /etc/httpd/conf.d/nagios/nagios.conf
Yum install dbmmanage
Mkdir users
Htdbm –c /var/www/users/.dbmpasswd mrtgadmin
De seguida adicionou-se ao ficheiro httpd.conf as seguintes linhas:
<Location /mrtg> AuthType basic AuthName "mrtg" AuthBasicProvider dbm AuthDBMType SDBM AuthDBMUserFile /var/www/users/.dbmpasswd Require valid-user</Location>
Por fim, bastou aceder ao browser, confirmar o login com password escolhida no passo anterior, para se poder visualizar as
sete interfaces disponíveis no gtDEI.dei.uc.pt, como neste caso apenas nos interessa as interfaces 3 e 5, removemos o
código HTML das restantes interfaces.
Ferramentas de Monitorização: Nagios
A activação do NAGIOS foi executada exactamente como refere o quickstart alojado no seguinte endereço:
http://nagios.sourceforge.net/docs/3_0/quickstart-fedora.html
Apresentamos assim os comandos que considerámos mais importantes.
yum install php gcc glibc glibc-common gd gd-devel openssl-devel
useradd -m nagios
passwd nagios
groupadd nagcmd
usermod -a -G nagcmd nagios
usermod -a -G nagcmd apache
mkdir ~/downloads
cd ~/downloads/
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.3.tar.gz
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
DEI-FCTUC
Nagios 2
tar xzf nagios-3.2.3.tar.gz
cd nagios-3.2.3
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode
De seguida foi necessário definir os contactos do administrador do nagios.
gedit /usr/local/nagios/etc/objects/contacts.cfg
make install-webconf
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
service httpd restart
No passo seguinte como no MRTG foi necessário estabelecer um login e uma password de acesso, neste caso para a
configuração do NAGIOS no apache.
make install-webconf
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
service httpd restart
Instalação dos plugins extras pedidos do NAGIOS.
tar xzf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
chkconfig --add nagios
chkconfig nagios on
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
service nagios start
DEI-FCTUC
Nagios 3
Ferramentas de monitorização: MIB Browser
Não sendo só por si uma ferramenta de monitorização, foi um instrumento essencial na realização do trabalho.
Como dá para ver na figura anterior, é uma ferramenta extremamente útil para explorar as várias MIB’s utilizadas na
realização do trabalho. Mais à frente falaremos delas em mais detalhes, mas achámos que tinha uma interface mais intuitiva
que o snmpwalk. Além disso, também consegue emular a geração de traps, como se pode ver na seguinte figura:
DEI-FCTUC
Nagios 4
Configuração das máquinas: Linux
Começámos por instalar o snmp:
yum install snmp
Fomos ao ficheiro snmpd.conf e alterámos os seguintes parâmetros:
#traps
#avisar abaixo de 10% livre
includeAllDisks 10%
#avisar quando tem menos de 1 giga livre
disk / 1000000
#Dar acesso ao localhost e a rede interna
# sec.name source community
com2sec local localhost public
com2sec mynetwork 172.16.168.0/24 public
#Foi necessário criar dois grupos: para a localhost e para a rede
# groupName securityModel securityName
group MyRWGroup v1 local
group MyRWGroup v2c local
group MyRWGroup usm local
group MyROGroup v1 mynetwork
group MyROGroup v2c mynetwork
group MyROGroup usm mynetwork
#Demos a tudo se a ligação viesse da rede local
# group context sec.model sec.level prefix read write notif
access MyROGroup "" any noauth exact all none none
access MyRWGroup "" any noauth exact all all none
#criamos uma view “all” para posteriormente dar acesso aos grupos
## incl/excl subtree mask
view all included .1 80
#view rwview included system.sysLocation
syslocation Numa vm, provavelmente ali ao lado (a correr Fedora)
syscontact david <[email protected]>
DEI-FCTUC
Nagios 5
Como íamos monitorizar os serviços HTTP/HTTPS, IMAP/IMAPS, POP3/POP3S e SNMP, configurámos esta máquina como
apresentado nos trabalhos 1 de SSC e 2 de GSR, não entrando aqui em detalhe.
Configuração das máquinas: Windows
Para ser possível utilizar o protocolo SNMP para monitorizar uma máquina windows é necessário instalar o componente
Protocolo SNMP (Simple Network Management Protocol) que activará o Serviço SNMP.
Durante a configuração do nágios reparámos que alguns valores não estavam disponíveis na MIB standard (SNMPv2-MIB),
tendo portanto feito o download do SNMP Informant, que nos dá acesso a algumas informações que anteriormente não
estavam disponíveis, como se vê na seguinte figura:
DEI-FCTUC
Nagios 6
Configurações específ icas: snmptrapd
A única confiugração necessária para utilizar o snmptrapd é dizer-lhe quem tratará dos requests recebidos pela máquina,
neste caso, o snmptt. Isto é conseguido adicionando a seguinte linha ao ficheiro de configuração /etc/snmp/
snmptrapd.conf
traphandle default /usr/sbin/snmptthandler
Para utitlizar o snmptrapd como serviço, foi preciso adicionar a opção -On no ficheiro /etc/rc.d/init.d/snmptrapd
OPTIONS="-Lsd -p /var/run/snmptrapd.pid -On"
Configurações específ icas: snmptt
Começámos por instalar o snmptt, fazendo o download da versão 1.3. Os ficheiros vêm pré-compilados, tendo os três
executáveis (snmptt, snmptthandler e snmpttconvertmib) sido copiados para a pasta /usr/sbin
Para podermos utilizar o snmptt, tivemos que instalar várias bibliotecas de PERL que, por termos utilizado o MCPAN, já não
sabemos quais foram (mas enquanto não tivermos todas instaladas vamos tendo feedback a dizer o que precisamos).
Para utilizar estes ficheiros como serviço tivémos que realizar o seguinte comando:
cp snmptt-init.d /etc/rc.d/init.d/snmptt
Metemos também para a pasta /etc/snmp o ficheiro snmptt.ini, editando a seguinte linha:
# Set to either 'standalone' or 'daemon'
# standalone: snmptt called from snmptrapd.conf
# daemon: snmptrapd.conf calls snmptthandler
# Ignored by Windows. See documentation
mode = daemon
DEI-FCTUC
Nagios 7
Continuámos utilizando o snmpttconvertmib para ir buscar todas as traps existentes nas mibs que prentendíamos usar:
snmpttconvertmib --in=/usr/share/snmp/mibs/SNMPv2-MIB.txt --out=/etc/snmp/snmptt.conf
snmpttconvertmib --in=/usr/share/snmp/mibs/UCD-SNMP-MIB.txt --out=/etc/snmp/snmptt.conf
Como resultado, obtivémos um ficheiro com as várias traps que podereriam aparecer no seguinte formato:
MIB: SNMPv2-MIB (file:/usr/share/snmp/mibs/SNMPv2-MIB.txt) converted on Mon Dec 19 17:03:00 2011 using snmpttconvertmib v1.3
#
#
#
EVENT coldStart .1.3.6.1.6.3.1.1.5.1 "Status Events" Normal
FORMAT A coldStart trap signifies that the SNMP entity, $*
SDESC
A coldStart trap signifies that the SNMP entity,
supporting a notification originator application, is
reinitializing itself and that its configuration may
have been altered.
Variables:
EDESC
EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 2 "Problem at $H!! coldStart"
A parte “EXEC” foi adicionada por nós e é um comando que fará o nagios reconhecer um problema (atravez do ficheiro
nagios.cmd) que foi gerada uma trap nalgum dos servidores a ser monitorizado. Neste caso, metemos com o nível 2
(Critical) e com o nome do host e a trap gerada.
Tal como este excerto de código foi gerada uma uma acção diferente para cada trap.
define service{
host_name 3com
service_description TRAP
is_volatile 1
check_command check-host-alive
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
active_checks_enabled 0
passive_checks_enabled 1
check_period 24x7
notification_interval 1
notification_period 24x7
DEI-FCTUC
Nagios 8
notification_options u,c,w
notifications_enabled 1
contact_groups admins
}
No exemplo atenriormente desctrito temos o código de configuração do nagios referente ao switch 3com.
Embora no enunciado pedisse para só monitorizarmos traps do switch 3com, não era fácil metermos o switch a gerar traps,
tendo por isso obtado por replicar o código para a máquina windows (resultados apresentados na secção de testes mais à
frente).
Configurações específ icas: Nagios
Templates:
A maior parte das configurações do nagios herdam de um destes templates.
###############################################################################
###############################################################################
#
# CONTACT TEMPLATES
#
###############################################################################
###############################################################################
# Generic contact definition template - This is NOT a real contact, just a template!
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
DEI-FCTUC
Nagios 9
define contact{
name generic-sms ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
host_notification_options d,u,r ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-sms ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
###############################################################################
###############################################################################
#
# HOST TEMPLATES
#
###############################################################################
###############################################################################
# Generic host definition template - This is NOT a real host, just a template!
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_period 24x7 ; Send host notifications at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
# Linux host definition template - This is NOT a real host, just a template!
DEI-FCTUC
Nagios 10
define host{
name linux-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 2 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period 24x7 ; Send notification out at any time - day or night
notification_interval 1 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins,sms ; Notifications get sent to the admins by default
hostgroups linux-servers
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
# Windows host definition template - This is NOT a real host, just a template!
define host{
name windows-server ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, Windows servers are monitored round the clock
check_interval 5 ; Actively check the server every 5 minutes
retry_interval 2 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each server 10 times (max)
check_command check-host-alive ; Default command to check if servers are "alive"
notification_period 24x7 ; Send notification out at any time - day or night
notification_interval 1 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins,sms ; Notifications get sent to the admins by default
hostgroups windows-servers ; Host groups that Windows servers should be a member of
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
DEI-FCTUC
Nagios 11
# We define a generic printer template that can be used for most printers we monitor
define host{
name generic-printer ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, printers are monitored round the clock
check_interval 5 ; Actively check the printer every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each printer 10 times (max)
check_command check-host-alive ; Default command to check if printers are "alive"
notification_period workhours ; Printers are only used during the workday
notification_interval 30 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
# Define a template for switches that we can reuse
define host{
name generic-switch ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, switches are monitored round the clock
check_interval 5 ; Switches are checked every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each switch 10 times (max)
check_command check-host-alive ; Default command to check if routers are "alive"
notification_period 24x7 ; Send notifications at any time
notification_interval 5 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins,sms ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
DEI-FCTUC
Nagios 12
###############################################################################
###############################################################################
#
# SERVICE TEMPLATES
#
###############################################################################
###############################################################################
# Generic service definition template - This is NOT a real service, just a template!
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
DEI-FCTUC
Nagios 13
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
# Local service definition template - This is NOT a real service, just a template!
define service{
name local-service ; The name of this service template
use generic-service ; Inherit default values from the generic-service definition
max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
Gateways:
Para a monitorização do switch 3com e do gtdei.dei.uc.pt utilizámos a seguinte configuração:
###########################################################################
###########################################################################
#
# HOST DEFINITIONS
#
###########################################################################
###########################################################################
# Define the switch that we'll be monitoring
define host{
use generic-switch ; Inherit default values from a template
host_name gtdei.dei.uc.pt ; The name we're giving to this switch
alias gtdei.dei.uc.pt Switch ; A longer name associated with the switch
address 193.137.203.254 ; IP address of the switch
hostgroups switcheandrouters ; Host groups this switch is associated with
}
define host{
use generic-switch ; Inherit default values from a template
host_name 3com ; The name we're giving to this switch
alias switch sala de redes Switch ; A longer name associated with the switch
address 10.254.0.200 ; IP address of the switch
hostgroups switcheandrouters ; Host groups this switch is associated with
DEI-FCTUC
Nagios 14
}
###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################
# Create a new hostgroup for switches
define hostgroup{
hostgroup_name switcheandrouters ; The name of the hostgroup
alias Network Switches and routers ; Long name of the group
}
###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################
# Create a service to PING to switch
define service{
use generic-service ; Inherit values from a template
host_name gtdei.dei.uc.pt ; The name of the host the service is associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
define service{
use generic-service ; Inherit values from a template
host_name 3com ; The name of the host the service is associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
DEI-FCTUC
Nagios 15
# Monitor uptime via SNMP
define service{
use generic-service ; Inherit values from a template
host_name gtdei.dei.uc.pt
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}
define service{
use generic-service ; Inherit values from a template
host_name 3com
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}
# Monitor Port 3 status via SNMP
define service{
use generic-service ; Inherit values from a template
host_name gtdei.dei.uc.pt
service_description Port 3 Link Status
check_command check_snmp!-C public -o ifOperStatus.5 -r 1 -m RFC1213-MIB
}
# Monitor Port 5 status via SNMP
define service{
use generic-service ; Inherit values from a template
host_name gtdei.dei.uc.pt
service_description Port 5 Link Status
check_command check_snmp!-C public -o ifOperStatus.7 -r 1 -m RFC1213-MIB
}
# Monitor Port 115 and 122 status via SNMP (3com)
define service{
use generic-service ; Inherit values from a template
host_name 3com
service_description Port 198 Link Status
check_command check_snmp!-C public -o ifOperStatus.198 -r 1 -m RFC1213-MIB
}
define service{
use generic-service ; Inherit values from a template
DEI-FCTUC
Nagios 16
host_name 3com
service_description Port 199 Link Status
check_command check_snmp!-C public -o ifOperStatus.199 -r 1 -m RFC1213-MIB
}
# Monitor bandwidth via MRTG logs
define service{
use generic-service ; Inherit values from a template
host_name gtdei.dei.uc.pt
service_description Port 3 Bandwidth Usage
check_command check_local_mrtgtraf!/var/www/mrtg/gtdei.dei.uc.pt_3.log!AVG!1000000,1000000!5000000,5000000!10
}
define service{
use generic-service ; Inherit values from a template
host_name gtdei.dei.uc.pt
service_description Port 5 Bandwidth Usage
check_command check_local_mrtgtraf!/var/www/mrtg/gtdei.dei.uc.pt_5.log!AVG!1000000,1000000!5000000,5000000!10
}
define service{
host_name 3com
service_description TRAP
is_volatile 1
check_command check-host-alive
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
active_checks_enabled 0
passive_checks_enabled 1
check_period 24x7
notification_interval 1
notification_period 24x7
notification_options u,c,w
notifications_enabled 1
contact_groups admins
}
Note-se que para a verificação de bandwidth foram utilizados os comandos específicos para ler logs de MRTG, enquanto
que para todos os outros foi utilizado SNMP.
Repare-se também que embora o nome das portas no gateway sejam 3 e 5, os seus respectivos interfaces na máquina são
o 5 e o 7.
Metemos também 2 portas do switch (a 199 e a 198) visto estarmos a utilizar unicamente máquinas virtuais neste trabalho.
Finalmente, o código de traps está ligado ao grupo 3com.
DEI-FCTUC
Nagios 17
Windows
Para a máquina windows utilizámos o seguinte config:
###############################################################################
###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################
###############################################################################
# Define a host for the Windows machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation
define host{
use windows-server ; Inherit default values from a template
host_name winserver ; The name we're giving to this host
alias My Windows Server ; A longer name associated with the host
address 172.16.168.128 ; IP address of the host
}###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################
# Define a hostgroup for Windows machines
# All hosts that use the windows-server template will automatically be a member of this group
define hostgroup{
hostgroup_name windows-servers ; The name of the hostgroup
alias Windows Servers ; Long name of the group
}
###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################
# Create a service for monitoring the version of NSCLient++ that is installed
# Change the host_name to match the name of the host you defined above
define service{
use generic-service
host_name winserver
DEI-FCTUC
Nagios 18
service_description C: disk space
check_command check_snmp!-C public -o .1.3.6.1.2.1.25.2.3.1.6.2 -w 404844 -c 704844 -u 'x4096_for_bytes'
}
define service{
use generic-service
host_name winserver
service_description CPU load
check_command check_snmp!-C public -o .1.3.6.1.2.1.25.3.3.1.2.1 -u '%' -w 50 -c 70
}
define service{
use generic-service
host_name winserver
service_description Memory Used bytes
check_command check_snmp!-C public -o .1.3.6.1.4.1.9600.1.1.2.4.0 -u 'Bytes'
}
define service{
use generic-service
host_name winserver
service_description Memory Free bytes
check_command check_snmp!-C public -o .1.3.6.1.4.1.9600.1.1.2.1.0 -u 'Bytes'
}
define service{
host_name winserver
service_description TRAP
is_volatile 1
check_command check-host-alive
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
active_checks_enabled 0
passive_checks_enabled 1
check_period 24x7
notification_interval 1
notification_period 24x7
notification_options u,c,w
notifications_enabled 1
contact_groups admins
}
Embora o nagios permita a utilização do NSclient++ para a verificação do estado dos serviços, era pedido a utilização de
snmp. Como não encontrámos um valor na mib default que retratasse fielmente a ram em uso, instalamos uma mib externa
(****.9600.****) que permitiu a obtenção de valores mais exactos.
DEI-FCTUC
Nagios 19
Linux
A configuração da máquina linux é extremamente semelhante, variando apenas o endereço nas MIBs das várias features
(utilizámos a UCD-SNMP-MIB). De notar que além da monitorização também utilizámos vários comandos para verificar a
actividade de serviços (processo que será explicado mais à frente).
###############################################################################
###############################################################################
#
# HOST DEFINITIONS
#
###############################################################################
###############################################################################
# Define a host for the Linux machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation
define host{
use linux-server ; Inherit default values from a template
host_name lserver ; The name we're giving to this host
alias My Linux Server ; A longer name associated with the host
address 172.16.168.130 ; IP address of the host
}
###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################
# Define a hostgroup for Linux machines
# All hosts that use the linux-server template will automatically be a member of this group
###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################
define service{
use generic-service
host_name lserver
service_description disk space used on / percentage
check_command check_snmp!-C public -o .1.3.6.1.4.1.2021.9.1.9.1 -u '%'
DEI-FCTUC
Nagios 20
}
define service{
use generic-service
host_name lserver
service_description disk space used on / kB
check_command check_snmp!-C public -o .1.3.6.1.4.1.2021.9.1.8.1 -u 'kB'
}
define service{
use generic-service
host_name lserver
service_description disk space free on / kB
check_command check_snmp!-C public -o .1.3.6.1.4.1.2021.9.1.7.1 -u 'kB'
}
define service{
use generic-service
host_name lserver
service_description CPU load
check_command check_snmp!-C public -o .1.3.6.1.4.1.2021.11.10.0 -u '%' -w 50 -c 70
}
define service{
use generic-service
host_name lserver
service_description Total Free Memory kbytes
check_command check_snmp!-C public -o .1.3.6.1.4.1.2021.4.11.0 -u 'kBytes'
}
define service{
use generic-service
host_name lserver
service_description Available Memory kbytes
check_command check_snmp!-C public -o .1.3.6.1.4.1.2021.4.6.0 -u 'kBytes'
}
define service{
use generic-service
host_name lserver
service_description POP sem SSL
check_command check_pop
}
define service{
DEI-FCTUC
Nagios 21
use generic-service
host_name lserver
service_description POP via SSL
check_command check_pop!-S -p 995
}
define service{
use generic-service
host_name lserver
service_description IMAP sem SSL
check_command check_imap
}
define service{
use generic-service
host_name lserver
service_description IMAP via SSL
check_command check_imap!-S -p 993
}
define service{
use generic-service
host_name lserver
service_description SSH
check_command check_ssh
}
define service{
use generic-service
host_name lserver
service_description SMTP
check_command check_smtp
}
define service{
use generic-service
host_name lserver
service_description HTTP
check_command check_http
}
define service{
use generic-service
host_name lserver
service_description HTTPS
check_command check_http!-S
}
DEI-FCTUC
Nagios 22
A obtenção do estado dos serviços foi primeiro testado na consola recorrendo aos plugins do nágios:
HTTP:
./check_http -H 172.16.168.130 -S
SMTP:
./check_smtp -H 172.16.168.130
IMAP:
./check_imap -H 172.16.168.130
IMAPS:
./check_imap -H 172.16.168.130 -S -p993
POP3:
./check_pop -H 172.16.168.130
POP3S
./check_pop -H 172.16.168.130 -S -p995
SSH
./check_ssh -H 172.16.168.130
Estes comandos foram posteriormente testados e traduzidos para verificarem os respectivos serviços
Contacts
Para os dois tipos de contacto usado foram criados dois grupos, um para sms’s e outro para emails normais
###############################################################################
###############################################################################
#
# CONTACTS
#
###############################################################################
###############################################################################
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
email [email protected] ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
define contact{
contact_name nagiossms
use generic-sms
DEI-FCTUC
Nagios 23
alias SMS Warnings
email [email protected]
}
###############################################################################
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################
###############################################################################
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
define contactgroup{
contactgroup_name sms
alias SMS-email
members nagiossms
}
Commands
Embora de início tivéssemos algumas plugins custom, acabámos por utilizar apenas os plugins default, não sendo por isso
necessário por aqui o ficheiro de configuração completo. A única alteração que utilizámos foi a criação de um comando
para enviar sms’s:
define command{
command_name notify-host-by-sms
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -r [email protected] -s "918551620" $CONTACTEMAIL$
}
Este comando, basicamente, envia uma sms com o estado do serviço a partir do e-mail de um de nós com o número de
telefone de destino no campo “To:”. O e-mail de contacto foi definido na parte contacts.
DEI-FCTUC
Nagios 24
Testes
Estas duas figuras mostram que os primeiro requisitos do trabalho foi preenchido: É acedido pelo url http://gsrtp3.dei.uc.pt/
nagios e http://gsrtp3.dei.uc.pt/mrtg e é pedida uma password de acesso previamente configurada.
DEI-FCTUC
Nagios 25
Estas três imagens mostram o mapa e os hosts que estão a ser monitorizados. A terceira entra em mais detalhe sobre
todos os serviços de cada 1.
DEI-FCTUC
Nagios 26
Esta imagem mostra uma visão geral de todos os serviços monitorizados.
Esta imagem mostra em detalhe o switch 3com. Como não recebemos nenhuma trap, esta aparece em cinzento.
Detalhe do router de gtdei.dei.uc.pt. O estado crítical na porta 5 deve-se ao threshold definido nos fichiros de configuração,
que depois mantivémos como forma de teste de avisos dos serviços.
DEI-FCTUC
Nagios 27
Nesta figura vê-se o estado do servidor linux e de todos os seus serviços. O Status information confirma que todos os
comandos estão a ser executados correctamente.
Nesta figura mostra-se o servidor windows com um serviço em “warning”, definido propositadamente nos thresholds do
comando de forma a testar.
DEI-FCTUC
Nagios 28
O mib browser permite o envio de traps com um interface extremamente intuitivo, neste caso enviámos uma do tipo
ColdStart.
Nesta imagem vemos o processamento do smptrapd da trap, que por sua vez envia para smptt a trap que a reconhece
como uma das que tratámos. Is gera um evento do tipo EXTERNAL COMMAND já explicado mais acima.
De seguida, é feita a verificação do serviço pelo nagios o que, devido ao estado “Critical”, envia um aviso via e-mail para o
admin.
DEI-FCTUC
Nagios 29
Finalmente vemos a apresentação da trap do tipo coldStart (funcionalidade supostamente para o switch 3com, mas testada
a partir do windows devido a podermos enviar traps a pedido) no ecrã do nágios.
Aqui temos um exemplo da página de notificações. Repare-se que o services recebem avisos unicamente por e-mail,
equanto o estado dos hosts é também enviado por sms.
DEI-FCTUC
Nagios 30
E finalmente temos aqui 2 exemplos de avisos recebidos, os da esquerda por SMS e o da direita por e-mail.
DEI-FCTUC
Nagios 31