Upload
vuthien
View
223
Download
0
Embed Size (px)
Citation preview
© T
HAL
ES N
EDER
LA
Monitoring at Thales Hengelo using NagiosMonitoring at Thales Hengelo using Nagios2010
4/11
/09
14
1 Nagios at THALES NEDERLAND B.V.
Monitoring at Thales Hengelo using Nagios
Pieter van [email protected]
2
10/20/2010
Nagios at THALES NEDERLAND B V
© T
HAL
ES N
EDER
LA
Introduction
•IntroductionTh l–Thales
–ISDS it / h it i•Security / why monitoring
•Nagios?–What is it do?
–What does it do?–How does it do it?–How do we tell it what to do?–Security
3 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Thales Nederland at a glance
–Established in 1922 HengeloHead Office–Sales Thales Nederland B.V.
~ € 500 million,of which > 75% on export
Head OfficeNavalAir SystemsSecurity Solutions & Services
H i
–Order backlog € 1,2 billion
D lft
HuizenLand & Joint Systems
–~ 2.000 employees in 5 locations
–The largest defence company
EnschedeTx Cell
DelftDECISAir Systems
The largest defence companyin the Netherlands
–Main centres of excellence:
EindhovenLand & Joint Systems
•Naval System Integration•Naval Sensors & C2•Vehicle Communication Systems
4 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
ISD – Information Systems Department
•60 employeesData center•Data center
–Over 400 serversM th 750 li ti–More than 750 applications
•About 3000 workstationsC t lt f il bl i f t t 1 t 2 M €•Cost as result of unavailable infrastructure: 1 to 2 M €
per day
5 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Security
ComplianceABDO (MIVD: G l R i t f D f C t t )–ABDO (MIVD: General Requirements for Defence Contracts)
–Export ControlG IT li i–Group IT policies
–Law
Fundamental Principles of Security:•Availability – can I get to my data•Integrity – are my data accurate and reliable•Confidentiality – is the necessary level of secrecy assured
6 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Monitoring
Objective: Ensuring the Availability and Integrity of the IT infrastructureinfrastructure
•Availability/Integrity–Notice current problemsNotice current problems–Predict problems / trending•AutomationAutomation–Handle routine management actionsLower managements hours/costs–Lower managements hours/costs
–Increase quality of service•Decrease number of incidents•Decrease number of incidents•Pro-active verses re-active
7 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Tools used for Monitoring
Tools to use:Nagios• Nagios– Events/Alarms
MRTG M lti R t T ffi G h• MRTG – Multi Router Traffic Grapher– Long term– Trends– Extensive modifications to monitor just about anything
8 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Why Nagios?
• NagiosO S– Open Source
• Free as in Beer => easily available• Free as in Speech => source available• Free as in Speech => source available
– Unix philosophy – it does one thing very wellOpen architecture adaptable / extendable– Open architecture – adaptable / extendable
• Other solutionsClosed– Closed
– Commercial – not easy available to testMonolithic not adaptable without help of supplier– Monolithic – not adaptable without help of supplier
• NAGIOS is the chosen and appointed tool within Thales Group
9 Nagios at THALES NEDERLAND B.V.
Thales Group
© T
HAL
ES N
EDER
LA
History1999 N tS i t•1999: NetSaint
•2002: Nagios•2005: First test at Thales Hengelog•2006:–Nagios 2.0–Nagios integrated in Network Security Toolkit•2007:–Nagios EnterprisesNagios Enterprises–Operational at Thales Hengelo•2008:–Nagios 3.0–Infoworld BOSSIE Award•2009:•2009:–Nagios Support Portal–Fork: Icinga
10 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
NAGIOS acronym
•NAGIOSNagios
•NAGIOSNotices–Nagios
–Ain'tGonna
–Notices–AnyGlit h–Gonna
–InsistO
–Glitch–InO–On
–Sainthood Ethan
–Our–SystemS TildGalstad Sam Tilders
11 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
What is Nagios?
•Host and service based network monitorDesigned to run under Linux works under most UNIX•Designed to run under Linux, works under most UNIX
variants•Open Source Software•Open Source Software•Has Reporting•Very smart scheduler•Very smart scheduler
12 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
What does Nagios do?
•Alert you before the client doesProvide information on what caused the outage•Provide information on what caused the outage
•Also known as “Root Cause Analysis”All i k•Allow quick recovery
•Provide forensicsP id t bilit•Provide accountability
13 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Tactical Overview
14 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Service Groups
15 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Service Group - Summary
16 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Service Status Details
17 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Service State Host
18 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Problems - Service
19 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Availability
20 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Event Log
21 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Firefox add-on: Nagios Checker
22 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Firefox add-on: Nagios Checker
https://addons.mozilla.org/de/firefox/addon/3607/http://code.google.com/p/nagioschecker/
23 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Nagios – How does it work
•Schedule checks and logging the results using:Pl i–Plugins
–NRPE – Nagios remote plugin executionNSCA N i S i Ch k A t–NSCA - Nagios Service Check Acceptor
–NDOUtils – Nagios Data Objects UtilitiesG t l t i f E W i•Generate alerts in case of Errors or Warnings
24 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Plugins
Nagios process
plugin Local resourceor process
g p(core logic)
pluginremote resourceor processor process
Local hostNagios server
Remote host
25 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
plugin
•Checks resource or processReturns string: SERVICE STATUS: Information text•Returns string: SERVICE STATUS: Information text
•STATUS:0 OK–0 OK
–1 Warning2 C iti l–2 Critical
–3 Unknown
26 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
NRPE – Nagios remote plugin execution
Local
NRPE
pluginLocal resourceor processnagio
s check_nrpe
plugins _ p
ssl
remote hostMonitoring host
remote resource
Nagios server
resourceor process
27 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
NSCA - Nagios Service Check Acceptor
3th party
nagios
3th partysoftware
ExternalCommand
NSCAClient
3th partysoftware
sNSCADaemon
ss
CommandFile
ssl
remote hostMonitoring hostNagios server
28 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
NDO - Nagios Data Objects Utilities
L d dNagios Daemon MySql
Database
NDO2DBDaemon
LoadedNDOMODModule DatabaseModule
NagiosLOG2NDO
Utility
NagiosLogfile
29 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Nagios – How do we tell it what to do
30 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Nagios – How do we tell it what to do
•Configure using Object definitions in cfg filesH t 410–Hosts – 410
–Hostgroups – 32S i 508 d fi iti–Services – 508 definitions
–Servicegroups – 24C t t–Contacts
–Contactgroups–Commands – Plugins•Maintenance–Take time to design your configs
31 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Object definition
define object-type {# Comment line# Comment line
parameter value ; Commentt lparameter value
}
32 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Host object
define host {use windows-serveruse windows serverhost_name pdmappalias pdmappp ppparents swm2fmax_check_attempts 3check_command check-host-alivecontact_groups windows-admins
tifi ti ti dnotification_options d,u,rnotification_period 24x7checks enabled 1checks_enabled 1check_period 24x7
}
33 Nagios at THALES NEDERLAND B.V.
}
© T
HAL
ES N
EDER
LA
Host template
define host{name windows-servername windows serveruse generic-hostcheck_period 24x7_pmax_check_attempts 10check_command check-host-alivenotification_period workhoursnotification_interval 240
tifi ti ti dnotification_options d,u,rcontact_groups windows-adminsregister 0register 0}
34 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Host Object hierarchy
generic-hostgeneric host
Windows- linux server switchserver linux-server switch
h t2 h t3h t1 host2 host3host1
35 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Hostgroup
define hostgroup{hostgroup name linux nrpe servershostgroup_name linux-nrpe-serversalias linux nrpe servers
b h t1members host1members host2
}}
define host {host_name meber-hostHostgroup linux-nrpe-servers......
36 Nagios at THALES NEDERLAND B.V.
}
© T
HAL
ES N
EDER
LA
Service template linux-basic-service
define service{use generic-servicegName linux-basic-servicehostgroup_name linux-nrpe-serversservice_description loadis_volatile 0check period 24x7check_period 24x7max_check_attempts 3normal_check_interval 3retry_check_interval 1notification_interval 1440notification period workhoursnotification_period workhoursnotification_options w,c,rRegister 0
37 Nagios at THALES NEDERLAND B.V.
}
© T
HAL
ES N
EDER
LA
Service definition
define service{use linux-basic-serviceservice_description loadcontact_groups linux-adminscheck_command check_nrpe!check_load
}define service{define service{
use linux-basic-serviceservice_description automountcontact_groups linux-adminscheck_command check_nrpe!check_automount
}}................
38 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Relations between objectsh t f i f
cma-clientsdefine hostgroup{
CMAdefine servicegroup{
servicegroup name CMA
hosts.cfg servicegroups.cfg
hostgroup_name cma-clientsalias linux nrpe serversMembers host1Members host2
}
servicegroup_name CMAalias CMAmembers hostx,FLEXLM_mti
}
define service{Use cma-client-serviceServicegroups CMA
services.cfgservicegroup CMAhostgroup cma-clients
hostgroup_name cma-clientsservice_description CMA_muxnotification_period workhourscontact_groups cma-admins,cma-admins-smscheck_command check_tcp!7410
}timeperiod workhourscontactgroup cma-admins
cma-adminsworkhoursdefine timeperiod{
contactgroups.cfg timeperiods.cfg
cma adminsdefine contactgroup{
contactgroup_name cma-adminsalias cma-adminsmembers usr1,usr2
}
define timeperiod{timeperiod_name workhoursalias Thales workhours, 07:00-18:00Monday 7:00-18:00Tuesday 7:00-18:00Wednesday 7:00-18:00Thursday 7:00-18:00Friday 7:00-18:00
39 Nagios at THALES NEDERLAND B.V.
Friday 7:00 18:00}
© T
HAL
ES N
EDER
LA
Notifications: Relations between objectsservices.cfg
define service{Use cma-client-serviceServicegroups CMAhostgroup_name cma-clientsservice_description CMA_muxnotification interval 1440notification_interval 1440notification_period workhourscontact_groups cma-admins,cma-admins-smscheck_command check_tcp!7410
}
contactgroups cfg timeperiods cfg
timeperiod workhourscontactgroup cma-admins
cma-adminsdefine contactgroup{
contactgroup_name cma-admins
workhoursdefine timeperiod{
timeperiod_name workhoursalias Thales workhours 07:00-18:00
contactgroups.cfg timeperiods.cfg
alias cma-adminsmembers usr1,usr2
}
alias Thales workhours, 07:00 18:00Monday 7:00-18:00Tuesday 7:00-18:00Wednesday 7:00-18:00Thursday 7:00-18:00Friday 7:00-18:00
}
define contact{contact_name usr1alias Pieter van Emmerik
contacts.cfg
define command{command name host notify by emailservice_notification_period 24x7
service_notification_options c,w,rservice_notification_commands notify-by-emailhost_notification_period 24x7host_notification_options d,r,uhost_notification_commands host-notify-by-emailemail [email protected]
command_name host-notify-by-emailcommand_line /usr/bin/printf "%b" "$NOTIFICATIONTYPE$
Host: $HOSTNAME$State: $HOSTSTATE$Info: $HOSTOUTPUT$" |/bin/mail -s "Host $HOSTSTATE$ -
$HOSTNAME$!"$CONTACTEMAIL$
40 Nagios at THALES NEDERLAND B.V.
pager 31638825830}
$CONTACTEMAIL$}
© T
HAL
ES N
EDER
LA
Notification Filters
• Main config file: enable_notifications=1• Service and Host Filters:
– 1 scheduled downtime– 2 flapping– 3 host- or service-specific notification options– 4 time period– 5 notification_interval– 6 Problem has been acknowledgedg
• Contact Filters:– 1 notification optionsp– 2 time period
41 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Host notification_options [d,u,r,f,s]
• d = send notifications on a DOWN state,u send notifications on an UNREACHABLE state• u = send notifications on an UNREACHABLE state,
• r = send notifications on recoveries (OK state),f d tifi ti h th h t t t d t• f = send notifications when the host starts and stops flapping, and
• s = send notifications when scheduled downtime• s = send notifications when scheduled downtime starts and ends.
42 THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Service notification_options [w,u,c,r,f,s]
• w = send notifications on a WARNING state,u send notifications on an UNKNOWN state• u = send notifications on an UNKNOWN state,
• c = send notifications on a CRITICAL state,d tifi ti i (OK t t )• r = send notifications on recoveries (OK state),
• f = send notifications when the service starts and stops flapping andstops flapping, and
• s = send notifications when scheduled downtime starts and endsstarts and ends.
43 THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Prevent Alerts
• Schedule downtime for planned maintenanceAcknowledge alarms• Acknowledge alarms
• Solve problems in a timely fashionM dif h k b d i hi h t t i t• Modify check boundaries which are too strict
44 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Application monitoring
• AnalyseC ll t i f ti– Collect information
– Draw a mapId tif iti l t• Identify critical components– How tested (realistic)
B d l (OK W i C iti l)– Boundary values (OK-Warning-Critical)• Plan of attack
– Failure scenarios– Who can solve What problem How
• Find/Create plug-ins• Create Nagios Objects
45 Nagios at THALES NEDERLAND B.V.
• Document
© T
HAL
ES N
EDER
LA
Application: Dimensions
46 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Application: Dimensions
check_tcp!671 check_tcp!671 check_http_content!http://xxx!80!Searchcheck http -H $HOSTADDRESS$check_http H $HOSTADDRESS$
-u $ARG1$ (http://xxx)-p $ARG2$ (80)-s $ARG3$ (Search)
check_tcp!9001check_tcp!671
check_flexlm2!glic02!7585
check tcp!1521
check_nrpe!check_part_san_fs0
check_disk -w 10% -c 5% -p /mnt/san/fs0
47 Nagios at THALES NEDERLAND B.V.
check_tcp!1521
check_oratns!bdim02 check_oracle – tns $ARG1$ Needs Oracle!
© T
HAL
ES N
EDER
LA
Application: CMA Teamcenter
48 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Application: CMA Teamcenter
check_tcp!7410
check_nrpe_arg1!check_process!dspserv.execheck_http_content!...
check_tcp!7410check_tcp!7410
h k 3! h k !WINWORD EXE!3!5
check_nrpe_arg1!check_process!FrameMaker.execheck_tcp!7411
check_tcp!7412check tcp!1531 check_nrpe_arg3!check_maxprocess!WINWORD.EXE!3!5check_tcp!7413
check_tcp!7414
_ p
49 Nagios at THALES NEDERLAND B.V.
check_cmaserv!8021 check_tcp -H $HOSTADRESS -p $ARG1$-s sendstr -e expectstr -q quitstr
check_flexlm2!pfsf254!8576
© T
HAL
ES N
EDER
LA
Application: PamaSR (Teamcenter)
50 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Application: PamaSR (Teamcenter)N iNRP Nagios
NRPE
check_nrpe_arg1!check_service!'"Apache Tomcat"'
h k 1! h k l db! lt 102check_nrpe_arg1!check_oracle_db!oltcp102
51 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Find/Create Plug-ins
• Find Plug-inshtt // i l i /– http://nagiosplugins.org/
– http://www.monitoringexchange.org/htt // h i /– http://exchange.nagios.org/
• Search some more or visit a forum for helphtt // iki i /i d h /F– http://wiki.nagios.org/index.php/Forums
• English - http://nagios.meulie.net/German http://www nagios portal de• German - http://www.nagios-portal.de
• Create your own plug-in and donate it to the community on monitorexchange orgcommunity on monitorexchange.org– http://nagiosplug.sourceforge.net/developer-guidelines.html– http://nagios sourceforge net/docs/3 0/pluginapi html
52 Nagios at THALES NEDERLAND B.V.
http://nagios.sourceforge.net/docs/3_0/pluginapi.html
© T
HAL
ES N
EDER
LA
Security Considerations
•Use a Dedicated Monitoring Box.Don't Run Nagios As Root•Don't Run Nagios As Root.
•Lock Down The Check Result Directory.L k D Th E t l C d Fil•Lock Down The External Command File.
•Require Authentication In The CGIs.I l t E h d CGI S it M•Implement Enhanced CGI Security Measures.
•Use Full Paths In Command Definitions.$ $•Hide Sensitive Information With $USERn$ Macros.
•Strip Dangerous Characters From Macros.•Secure Access to Remote Agents.•Secure Communication Channels.
53 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Links
Nagios Home Pagehttp://www nagios orghttp://www.nagios.orgMonitoring Exchange (Third-party products)htt // it i h /http://www.monitoringexchange.org/Complete Guide to Nagioshttp://www nagiosbook orghttp://www.nagiosbook.orgNagios, 2nd Edition, System and Network Monitoringhttp // nostarch com/nagios 2e htmhttp://www.nostarch.com/nagios_2e.htmhttp://nagiosplugins.org/htt // t i /k l d b / ffi i ldhttp://support.nagios.com/knowledgebase/officialdocs
54 Nagios at THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
Escalation
define service{host_name linux01service_description Databasenotification_period 24x7
define serviceescalation{host_name linux01service_description Databasefirst_notification 4
define serviceescalation{host_name linux01service_description Databasefirst_notification 8
notification_interval 120...contact_groups admins}
last_notification 10notification_interval 60contact_groups admins,second-level}
last_notification 12notification_interval 90contact_groups third-level}
55 THALES NEDERLAND B.V.
© T
HAL
ES N
EDER
LA
The End
• For questions on Nagios or if you need free Email or SSL certificates find me at the CAcert boothSSL certificates find me at the CAcert booth
www.cacert.orgwiki.cacert.orgwww.cacert.nl
56 Nagios at THALES NEDERLAND B.V.