57
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Beyond the ESM Administrator’s Guide Nathan Tisdale, Advanced Support Engineer

Beyond the ESM Administrator Guide

Embed Size (px)

DESCRIPTION

ESM Administration Document

Citation preview

Page 1: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Beyond the ESM Administrator’s Guide Nathan Tisdale, Advanced Support Engineer

Page 2: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 2

Introduction

Nathan Tisdale Advanced Support Engineer • 3+ years in ArcSight Technical Support • Train new support engineers • Assist in Premier Investigations and technical escalations • Advocate bug prioritization on behalf of customers • Believe in empowering ArcSight Admins

Page 3: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3

Agenda

A troubleshooting perspective: Data flow • Oracle vs CORR-Engine

Basic log analysis • Logs, whiner messages, memory

Advanced log analysis • Exceptions, Thread Dumps, Logfu

Live monitoring • Advanced Management Console

Page 4: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4

Audience

Is this presentation for you? • ArcSight Administrator • Responsible for ensuring continuous event flow through ESM • Enough experience to be curious about Thread Dumps

This presentation is similar to • SN62: Gain Rock Star Status: ArcSight ESM Manager Administrator • Refocused to provide insight on how to identify bottlenecks with current managers

Participation • Q & A • Hallway chats are welcome

Page 5: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Data flow

Page 6: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6

Simple ESM deployment

ArcSight SmartAgent

ArcSight SmartAgent

ArcSight SmartAgent

ArcSight Manager

ArcSight Database

ArcSight Web

ArcSight Console

Page 7: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7

Events insertion versus events retrieval

SeededJsse Listener threads

Start-of-flow Threads

(Normalization)

Pre-persistor Threads

Post-persistor Threads

(rules engine)

XCPUDMPC Threads (Data

Monitors)

Bytes read from Socket and converted to Java SecurityEvent Objects

Active channel queries Report queries Trend queries

Different resources retrieving event data from the database

Event insertion flow – events are being inserted into the database

Event insertions

Event retrievals

Database

Page 8: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8

Symptoms of performance issues

Event data retrieval • Channels slow to load • Channels don’t finish loading

– Channels show Loading Event ID • Reports failing or not running

– ORA-01555 or user cancelled operation • Reports based on trends are empty • Trends failing or not running • Trends getting disabled

Page 9: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9

Symptoms of performance issues

Event data insertion • Connectors caching continuously • Connector status shifting between up and

down frequently • Manager logs show one of the following

– It appears the database is hung – Rejected threads

• Delayed events (maybe not)

Page 10: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10

Processing stages

Threat Level Handler

Annotation Initializer

Event Asset Resolver

Event Category Adder Event Verifier

Geo Info Adder

Data monitors

Rules engine

Security Event Persistor Event Forwarder

Page 11: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11

Symptom

Making sense of Agent State Queues filling caused by • Database performance • Disk I/O • Slow rules engine processing • Slow Data Monitor processing

Symptom • Events Cache • STM eps < P-A eps

Page 12: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Basic log analysis

Page 13: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13

Page 14: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14

Application logs

Manager • <ARCSIGHT_HOME>/logs/default/*.log*

– SERVER.LOG – SERVER.STD.LOG – SERVER.STATUS.LOG – SERVER.REPORT.LOG – SERVER.SQL.LOG – SERVER.LICENSE.LOG – PARTITIONMANAGER.LOG – PARTITIONARCHIVER.LOG – PARTITIONCOMPRESSER.LOG – PARTITIONSTATSUPDATER.LOG

Oracle • <ORACLE_HOME>

– /admin/arcsight/bdump/ALERT_<LISTENER>.LOG – /network/log/LISTENER.LOG – /network/log/SQLNET.LOG

CORR-Engine • /opt/arcsight/logger/current/arcsight/logger/logs/* • /opt/arcsight/logger/data/mysql/*.log* • /opt/arcsight/logger/data/pgsql/serverlog*

Page 15: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15

Log rotation

• Log files are always limited in size – 10MB default

• Automatic log file rotation – 10 files are kept, plus the current file

• Can extend logging – <ARCSIGHT_HOME>/config/server.properties – Copy settings from <ARCSIGHT_HOME>/config/server.defaults.properties [DO NOT EDIT THIS FILE]

# The maximum size of the log file before it # will be rolled over. The size is specified # in MB (MegaByte). log.channel.file.property.maxsize=10MB # The maximum number of backup files to create # for rolling over. log.channel.file.property.maxbackupindex=10

Page 16: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16

Key manager logs

SERVER.STD.LOG • Initialization messages • General progess messages • Event batch insert times • Garbage collector information • Critical warnings • Uncaught exceptions • Watchdog messages

– Wrapper manages life cycle of manager processes Log rotation configured via wrapper • <ARCSIGHT_HOME>/config/server.wrapper.conf • Copy settings from server.defaults.wrapper.conf

[do not edit the defaults file]

SERVER.LOG • Basic application log • Exceptions with detailed traces

SERVER.STATUS.LOG • Information from Mbeans

– Agent throughputs and status – Active Lists statistics – Rule and Data Monitor resource consumption

• Also see manage.jsp

Page 17: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17

Other manager logs

SERVER.SQL.LOG • Needs to be enabled • Useful for Oracle DBA

SERVER.CHANNEL.LOG • Active Channel queries

SERVER.PULSE.LOG • Updated every 10 seconds

SERVER.LICENSE.LOG • License compliance per 24hrs

– Approaching or exceeded limit(s)

SERVER.REPORT.LOG • Logs report being run • More info in SERVER.LOG

– grep for [logReportInfo]

PARTITION*.LOG • Oracle partition management

– no present with CORR-Engine

Page 18: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18

Other data

Manager • Thread dump • Heap dump

Operating system • System logs • Performance data

Oracle • Database sessions • RDA • AWR • lsinvetory

CORR-Engine • Session Waits • Core Dump

Page 19: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19

Data to collect

Thread Dumps • Generate five Thread Dumps during the slowness

Logs • Manager logs • Oracle-based Manager

– Alert Log – DB Sessions

• CORR-E based Manager – Session Waits – mysql.log

System tables • If reproduction to be performed

Agent logs • If manager is not identified as the bottleneck

– Agent stability – Network connectivity – Network latency

• Save time and collect when generating TDs

99% of time, bottleneck found on manager

Page 20: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20

Collecting logs

ArcSight Sendlogs • Wizard interface allows user to easily gather:

– Manager logs – Agent logs – Web logs – Console logs – Oracle Alert log – Thread Dumps – Session Waits – Output from SQL

• Run from manager or console – ./arcsight sendlogs

Page 21: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21

BASH your logs

Demo script available • Get Status of Services

– /sbin/service arcsight_services status • Generate Thread Dumps

– ./arcsight managerthreaddump • Generate Session Wait

– ./arcsight arcdt session-waits –sp spool • Generate threaddumps.html

– ./arcsight threaddumps <path_to_server.std.log> • Collect Database Logs

– Oracle – CORR-E

• Place all data in tarball or zip file for upload

Page 22: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22

Whiner messages

Why • Subsystem failures

– Database connection problem – Event insertion times high – SSL certificate expiration

• Database space shortage – Running out of space – Usually event space – Sometimes system table space

• Partition manager failures – Get your DBA!

Where • stdout., server.std.log, server.log • Email • Console pop-up • Internal event

System alerts via email

Page 23: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Memory utilization

Page 24: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24

Memory usage

• Manager allocates memory in Java heap • Server.std.log

• Java heap is garbage collected

• Server only allocates memory • Java VM reclaims unused memory automatically • Manager doesn’t know how much garbage is in the heap

• Reported memory usage includes garbage

2006/02/22 23:22:51 | Memory Status: 765.6 MB Used, 1,014.0 MB Max 2006/02/22 23:22:52 | [Full GC 2006/02/22 23:22:58 | 797362K->471587K(1038336K), 5.9847261 secs]

Page 25: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25

Memory Two Types of Garbage Collection (GC)

Java heap is divided into generations Minor GC Only collects young generation May expand to entire heap, and become a major collection Major GC or Full GC Collects both young generation and tenured generation

Tenured Young

[GC 929899K->838966K(1036928K), 0.0353791 secs]

[Full GC 932135K->542955K(1036928K), 3.9721866 secs]

Page 26: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26

GC pause

Stop the world GC When GC is happening, everything else is stopped Pause Time Minor GC pause (“[GC …]”) Should be under 1 sec Major GC pause (“[Full GC ….]”) Actual time depends on hardware Estimate: ~1 sec every 200 MB heap

[Full GC 932135K->542955K(1036928K), 3.9721866 secs]

Page 27: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27

Real memory usage – the working set

Real memory usage is captured in “Full GC” messages server.std.log Working set is defined as the memory that is in actual use and doesn’t have any garbage. Working set of the Manager can be found as above, immediately after a “Full GC”

2006/02/22 23:22:51 | Memory Status: 765.6 MB Used, 1,014.0 MB Max 2006/02/22 23:22:52 | [Full GC 2006/02/22 23:22:58 | 797362K->471587K(1038336K), 5.9847261 secs]

Page 28: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28

How to choose heap size?

Recommendation for heap size is 2 x working set Too small • Frequent full GC • Bad performance • Manager could die on OutOfMemoryError

Too large • Peak performance is good, but… • Full GC takes long time to finish • Manager could get killed by Wrapper for being hung for a long time

Adjust heap size through Management Console, or by running ‘managersetup’

Page 29: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29

Out of memory

Server will restart on out of memory errors Check Logfu Check “CapsManager” from server.status.log to check overall memory utilization by Data Monitors, channels, Active Lists etc. If you see a spike Multiple memory intensive tasks at the same time? Increase heap size

Memory leak Memory usage keeps growing Increasing heap size only delays the problem Memory leak is hard to track down Contact support

Page 30: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Log analysis

Page 31: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31

Exceptions

Details of application errors • Java construct encapsulates some failures

– Coding errors – Transient bugs

• A full stack trace is included – Shows where in the code the error occurred

• Not all exceptions are equal – Misclassified or not significant impact – Sometimes related to content

Page 32: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32

Where is the cache?

SERVER.STATUS.LOG • Agent Statuses

AgentStatuses="[|||Name|ID|Reported|Agent Time|Received by Agent Count|Received by Agent EPS|Post-Filter Count|Post-Filter EPS|Post-Aggregation Count|Post-Aggregation EPS|Estimated Cache Size|Sent To Manager Count|Sent To Manager EPS|Failed Connection Attempts, archiver|3lSvf5BEBABCBfCSYubv3rw==|05/08 11:23:54|05/08 11:23:54|0|0.0|0|0.0|0|0.0|0|0|0.0|0, Syslog|3t5MEiRcBABDmQv3t56sCJQ==|05/08 11:23:31|05/08 11:23:31|4000|62.5|3643|56.921875|3643|56.921875|120,000|3650|57.03125|807761 Total|-|-|-|35,434|582.8|33,328|549.8|33,086|546.0|120,100|88,898|1,476.2|1,335,569]“

Page 33: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33

Delayed events

SERVER.LOG • default.com.arcsight.util.TimedRingBuffer][increment] Throwing out increment X, increment time = X,

acceptable range X - X (discarded=X)

Active channel • Gaps between Manager Receipt Time and End Time and Agent Receipt Time

– Device Receipt Time – Connector Receipt Time (a.k.a. Agent Receipt Time) – Manager Receipt Time – Start Time – End Time

Page 34: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 34

Database connectivity

SERVER.STD.LOG • Connectivity Issues

– SERVER.STD.LOG • SUBSYSTEM STATUS CHANGED

• Persistence Rate – should take less than 100ms • INFO | jvm 2 | 2009/05/07 20:41:58 | (02-Pre-SecurityEventPersistor330) Persisted 100 events

in 32 ms. • INFO | jvm 1 | 2009/05/08 11:20:53 | (02-Pre-SecurityEventPersistor1) Persisted 100 events

in 3,698 ms

Page 35: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35

Manager busy

SERVER.STD.LOG • Manager stops accepting events

• INFO | jvm 1 | 2005/04/04 00:42:26 | WARNING: '1' agent requests REJECTED because the limit of '64' agent threads was exceeded.

Page 36: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thread Dumps

Page 37: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 37

Insertion issues require Thread Dumps

SeededJsse Listener threads

Start-of-flow Threads

(Normalization)

Pre-persistor Threads

Post-persistor Threads

(rules engine)

XCPUDMPC Threads (Data

Monitors)

Bytes read from Socket and converted to Java SecurityEvent Objects

Active channel queries Report queries Trend queries

Different resources retrieving event data from the database

Event insertion flow – events are being inserted into the database

Event insertions

Event retrievals

Database

Page 38: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38

Don’t restart or reboot before collecting!

A Java snaphsot

Why Thread Dumps • Stack trace for each thread in the VM • Many different threads • Bottleneck area usually identifiable

– Session Waits or DB Sessions needed to correlate database activity

Generating Thread Dumps • Manage.jsp | NGServer | generateThreadDumps Invoke • <ARCSIGHT_HOME>/bin/arcsight managerthreaddump

Formatting Thread Dumps • <ARCSIGHT_HOME>/bin/arcsight threaddumps > threaddumps.html

Page 39: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39

Servlet engine

SeededJsseListener • Read bytes from network sockets • Convert read bytes to Java Objects “Security Event Batch” • Place event batches into queue for Flow 1

Page 40: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 40

Flow 1: Start

Start-of-flow • Vulnerability Scanner Reports • Place event batches into queue for Flow 2

Page 41: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 41

Flow 2: Pre-persistor

Pre-SecurityEventPersistor • Remove event from batch from queue • Initialize and normalize event fields • Write to database • Put event batch in to queue for Flow 3

Page 42: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 42

Flow 3: Post-persistor

Post-SecurityEventPersistor • Remove event from batch from queue • Evaluate events against rules • Generate Correlation events • Put event batch in to queue for Dashboards

Page 43: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 43

Content: Dashboards

XCPUDMPC-Thread • Remove event from batch from queue • Evaluate events against Data Monitors • Generate Correlation events • Put event batch in to queue for garbage collection

Page 44: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Logfu

Page 45: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 45

Logfu is not an officially supported tool

Why Logfu?

Discerning patterns • Examines server.log, server.std.log, and server.status.log • Syntax

– ArcSight logfu –m –noplot • -m is for “manager” • -noplot skipps plotting on graph

• Outputs logfu.html to logs/default/Logfu_<date>/ • Interesting data points

– “Famous Last Words” – Why did it die – “Exception Groups” – Quickly identify repeating exceptions – “Memory” – Identify growth in memory consumption – “Event Insertion” – Is the database/disk able to keep up

Page 46: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 46

Memory patterns

Page 47: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 47

Event throughput patterns

Page 48: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 48

Shutdown patterns

Page 49: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 49

Event insertion patterns

Page 50: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 50

Plot time per batch to identify network lag

Use with Connectors too

Page 51: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Advanced Management Interface a.k.a. manage.jsp

Page 52: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 52

https://<HOST_NAME>:8443/arcsight/web/manage.jsp

Status on demand

Interesting Mbeans • Agent State Tracker

– Specific and overall EPS for connectors • SessionManager

– How many users are logged in • SubsystemStatus Tracker

– Whiner • ActiveList Monitoring

– Memory consumption • Channels

– How many – Validating the SQL

Page 53: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 53

Groups and filters

Page 54: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 54

Mbean: RulesEngine

Page 55: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 55

Mbean: AgentStateTracker

Page 56: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you

Page 57: Beyond the ESM Administrator Guide

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Security for the new reality