Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

Preview:

Citation preview

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 1

#GALAXZ16

OwnIT Through Proactive MonitoringQuis custodiet ipsos custodes?

Who will monitor the monitors themselves?@jstanley232

Jason StanleyEnterprise Monitoring Engineer @Secure_24

jstanley734@gmail.comGithub.com/jstanley23Zenoss Community Forums/IRC: jstanley

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 2

Secure-24 has 15 years of experience delivering managed IT operations, application hosting and cloud services to enterprises worldwide. We manage SAP, Oracle, Hyperion, JD Edwards, and other mission critical applications across all industries and for businesses of every size. Our industry-leading client satisfaction rates result from lowering IT operational costs and our relentless focus on superior service and support.

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 3

Zenoss is the primary monitoring tool for infrastructure, client devices and applications.

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Replaced other monitoring platforms with Zenoss

• Oracle Enterprise Manager

• Solarwinds

• Nimsoft

• Nagios

• Tidal

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Primary Zenoss environment

• Zenoss 4.2.5 RPS 538

• 100+ ZenPacks

• 9k+ devices

• 1.7m+ data points

• Dedicated servers• 3 dedicated Hubs

• 16 dedicated multi-tenant collectors

• 9 customer dedicated collectors

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 6

Monitoring from within

Zenoss provides a lot of built-in self monitoring and additional ZenPacks.

Zenoss Daemons

› Processes

› Heartbeats

Zenoss Toolbox Scans

Tracebacks and exceptions

 ZenPacks

› ZenPacks.zenoss.MySqlMonitor

› ZenPacks.Zenoss.RabbitMQ

› ZenPacks.Zenoss.Memcached

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 7

Daemon monitoring

Built-in Methods

Process› Most daemon processes are already added

› Polls every 3 minutes

› Monitors CPU, memory, and count

/Status/Heartbeat› Takes longer to spawn event than processes

› Can signify issues with the daemon or hub

Note:› Verify new daemons are added to processes

› Heartbeats are same instance only

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 8

Zenoss ZenPacks

ZenPacks.zenoss.MySqlMonitor *› Critical to monitor up/down

› Primary use internal is graphs and trending

ZenPacks.Zenoss.RabbitMQ *› Critical to monitor up/down

› Primary use internal is graphs and trending

ZenPacks.Zenoss.Memcached› Can be monitoring internally for up/down

› Can have negative user experience if down

*Should monitor externally

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 9

Zenoss Toolbox Scans and Exceptions Events

https://github.com/zenoss/zenoss.toolbox

Setup scans in crontab to set and forget

All toolbox scans now create events!

Warning:› Do not run zencatalogscan –f without

zenrelationscan and findposkeyerror coming back clean first.

Exceptions and tracebacks

Modelers, datasources and templates can error out

Check your events for sneaky errors:› Message: traceback

› Message: exception

TALES exceptions will come in under the Hub’s full name and is a single event.

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Event Monitoring

Event flow in Zenoss is one of the more important aspects of the tool. Without events, you will not be alerted to any issues in your environments.

For this reason, we place a special need on monitoring this aspect.

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 11

Monitoring from afar

We focus on monitoring Zenoss event flow from a remote location. In case Zenoss goes down, we will still get alerted.

Zenoss Webserver

RabbitMQ

› rawevents

› zenevents

› signal

Zeneventserver

Synthetic Event Checks

› zeneventd Event processing and transforms

› Zeneventserver Changing event state

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 12

Web (Http) checks

Both zenwebserver and zeneventserver can be monitored with a simple http check. zenwebserver

› Http check to 8080 to the Dashboard URL with a regex /zport/dmd/Dashboard

zeneventserver› Http check to 8084 to hit the zeneventserver API

/zeneventserver/api/1.0/events

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 13

RabbitMQ

Very important to monitor RabbitMQ queues. If something happens with RabbitMQ, event processing is compromised in Zenoss.

For this reason, we will monitor the queues remotely. Alerting on anything above a certain threshold.*

* This threshold should be set depending on your environment.

We see 3 queues are the most important.› rawevents

Where raw events from the collectors are sent

› zenevents After events are processed by zeneventd, they are sent here for

zeneventserver

› signal Events that are true for any trigger and need to be processed by

a notification are sent here for zenactiond to process.

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 14

Synthetic Checks

Pre-existing event check› Checks the functionality of zeneventserver

by Acknowledging a pre-existing event *

Un-acknowledging a pre-existing event *

› Verifies the following is up and running: ZenDS

zeneventserver

zenwebserver

› Only uses a single event, if the event is closed a new one must be created• Script can be used to create event for you and provide the event

ID to use

New event check› Checks the Zenoss event process by:

Opening a new event

Finding new event

Verifying event was modified by transform

Closing event

Verifying event was closed

› Verifies the following is up and running: ZenDS

zenwebserver

zeneventd

zeneventserver

› Creates a new event each and every time

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Take Aways

The script we use for monitoring can be found on the community wiki or on github.com

Along with documentation on how to use it.

http://wiki.zenoss.org/Monitoring_Zenoss

https://github.com/jstanley23/MonitoringZenoss

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Question me this

Recommended