Upload
silverusandeep
View
183
Download
20
Tags:
Embed Size (px)
DESCRIPTION
ESM Administration Document
Citation preview
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Beyond the ESM Administrator’s Guide Nathan Tisdale, Advanced Support Engineer
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 2
Introduction
Nathan Tisdale Advanced Support Engineer • 3+ years in ArcSight Technical Support • Train new support engineers • Assist in Premier Investigations and technical escalations • Advocate bug prioritization on behalf of customers • Believe in empowering ArcSight Admins
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3
Agenda
A troubleshooting perspective: Data flow • Oracle vs CORR-Engine
Basic log analysis • Logs, whiner messages, memory
Advanced log analysis • Exceptions, Thread Dumps, Logfu
Live monitoring • Advanced Management Console
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4
Audience
Is this presentation for you? • ArcSight Administrator • Responsible for ensuring continuous event flow through ESM • Enough experience to be curious about Thread Dumps
This presentation is similar to • SN62: Gain Rock Star Status: ArcSight ESM Manager Administrator • Refocused to provide insight on how to identify bottlenecks with current managers
Participation • Q & A • Hallway chats are welcome
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data flow
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 6
Simple ESM deployment
ArcSight SmartAgent
ArcSight SmartAgent
ArcSight SmartAgent
ArcSight Manager
ArcSight Database
ArcSight Web
ArcSight Console
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7
Events insertion versus events retrieval
SeededJsse Listener threads
Start-of-flow Threads
(Normalization)
Pre-persistor Threads
Post-persistor Threads
(rules engine)
XCPUDMPC Threads (Data
Monitors)
Bytes read from Socket and converted to Java SecurityEvent Objects
Active channel queries Report queries Trend queries
Different resources retrieving event data from the database
Event insertion flow – events are being inserted into the database
Event insertions
Event retrievals
Database
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8
Symptoms of performance issues
Event data retrieval • Channels slow to load • Channels don’t finish loading
– Channels show Loading Event ID • Reports failing or not running
– ORA-01555 or user cancelled operation • Reports based on trends are empty • Trends failing or not running • Trends getting disabled
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9
Symptoms of performance issues
Event data insertion • Connectors caching continuously • Connector status shifting between up and
down frequently • Manager logs show one of the following
– It appears the database is hung – Rejected threads
• Delayed events (maybe not)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10
Processing stages
Threat Level Handler
Annotation Initializer
Event Asset Resolver
Event Category Adder Event Verifier
Geo Info Adder
Data monitors
Rules engine
Security Event Persistor Event Forwarder
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11
Symptom
Making sense of Agent State Queues filling caused by • Database performance • Disk I/O • Slow rules engine processing • Slow Data Monitor processing
Symptom • Events Cache • STM eps < P-A eps
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Basic log analysis
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14
Application logs
Manager • <ARCSIGHT_HOME>/logs/default/*.log*
– SERVER.LOG – SERVER.STD.LOG – SERVER.STATUS.LOG – SERVER.REPORT.LOG – SERVER.SQL.LOG – SERVER.LICENSE.LOG – PARTITIONMANAGER.LOG – PARTITIONARCHIVER.LOG – PARTITIONCOMPRESSER.LOG – PARTITIONSTATSUPDATER.LOG
Oracle • <ORACLE_HOME>
– /admin/arcsight/bdump/ALERT_<LISTENER>.LOG – /network/log/LISTENER.LOG – /network/log/SQLNET.LOG
CORR-Engine • /opt/arcsight/logger/current/arcsight/logger/logs/* • /opt/arcsight/logger/data/mysql/*.log* • /opt/arcsight/logger/data/pgsql/serverlog*
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15
Log rotation
• Log files are always limited in size – 10MB default
• Automatic log file rotation – 10 files are kept, plus the current file
• Can extend logging – <ARCSIGHT_HOME>/config/server.properties – Copy settings from <ARCSIGHT_HOME>/config/server.defaults.properties [DO NOT EDIT THIS FILE]
# The maximum size of the log file before it # will be rolled over. The size is specified # in MB (MegaByte). log.channel.file.property.maxsize=10MB # The maximum number of backup files to create # for rolling over. log.channel.file.property.maxbackupindex=10
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16
Key manager logs
SERVER.STD.LOG • Initialization messages • General progess messages • Event batch insert times • Garbage collector information • Critical warnings • Uncaught exceptions • Watchdog messages
– Wrapper manages life cycle of manager processes Log rotation configured via wrapper • <ARCSIGHT_HOME>/config/server.wrapper.conf • Copy settings from server.defaults.wrapper.conf
[do not edit the defaults file]
SERVER.LOG • Basic application log • Exceptions with detailed traces
SERVER.STATUS.LOG • Information from Mbeans
– Agent throughputs and status – Active Lists statistics – Rule and Data Monitor resource consumption
• Also see manage.jsp
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17
Other manager logs
SERVER.SQL.LOG • Needs to be enabled • Useful for Oracle DBA
SERVER.CHANNEL.LOG • Active Channel queries
SERVER.PULSE.LOG • Updated every 10 seconds
SERVER.LICENSE.LOG • License compliance per 24hrs
– Approaching or exceeded limit(s)
SERVER.REPORT.LOG • Logs report being run • More info in SERVER.LOG
– grep for [logReportInfo]
PARTITION*.LOG • Oracle partition management
– no present with CORR-Engine
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18
Other data
Manager • Thread dump • Heap dump
Operating system • System logs • Performance data
Oracle • Database sessions • RDA • AWR • lsinvetory
CORR-Engine • Session Waits • Core Dump
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19
Data to collect
Thread Dumps • Generate five Thread Dumps during the slowness
Logs • Manager logs • Oracle-based Manager
– Alert Log – DB Sessions
• CORR-E based Manager – Session Waits – mysql.log
System tables • If reproduction to be performed
Agent logs • If manager is not identified as the bottleneck
– Agent stability – Network connectivity – Network latency
• Save time and collect when generating TDs
99% of time, bottleneck found on manager
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20
Collecting logs
ArcSight Sendlogs • Wizard interface allows user to easily gather:
– Manager logs – Agent logs – Web logs – Console logs – Oracle Alert log – Thread Dumps – Session Waits – Output from SQL
• Run from manager or console – ./arcsight sendlogs
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21
BASH your logs
Demo script available • Get Status of Services
– /sbin/service arcsight_services status • Generate Thread Dumps
– ./arcsight managerthreaddump • Generate Session Wait
– ./arcsight arcdt session-waits –sp spool • Generate threaddumps.html
– ./arcsight threaddumps <path_to_server.std.log> • Collect Database Logs
– Oracle – CORR-E
• Place all data in tarball or zip file for upload
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22
Whiner messages
Why • Subsystem failures
– Database connection problem – Event insertion times high – SSL certificate expiration
• Database space shortage – Running out of space – Usually event space – Sometimes system table space
• Partition manager failures – Get your DBA!
Where • stdout., server.std.log, server.log • Email • Console pop-up • Internal event
System alerts via email
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Memory utilization
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24
Memory usage
• Manager allocates memory in Java heap • Server.std.log
• Java heap is garbage collected
• Server only allocates memory • Java VM reclaims unused memory automatically • Manager doesn’t know how much garbage is in the heap
• Reported memory usage includes garbage
2006/02/22 23:22:51 | Memory Status: 765.6 MB Used, 1,014.0 MB Max 2006/02/22 23:22:52 | [Full GC 2006/02/22 23:22:58 | 797362K->471587K(1038336K), 5.9847261 secs]
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 25
Memory Two Types of Garbage Collection (GC)
Java heap is divided into generations Minor GC Only collects young generation May expand to entire heap, and become a major collection Major GC or Full GC Collects both young generation and tenured generation
Tenured Young
[GC 929899K->838966K(1036928K), 0.0353791 secs]
[Full GC 932135K->542955K(1036928K), 3.9721866 secs]
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26
GC pause
Stop the world GC When GC is happening, everything else is stopped Pause Time Minor GC pause (“[GC …]”) Should be under 1 sec Major GC pause (“[Full GC ….]”) Actual time depends on hardware Estimate: ~1 sec every 200 MB heap
[Full GC 932135K->542955K(1036928K), 3.9721866 secs]
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27
Real memory usage – the working set
Real memory usage is captured in “Full GC” messages server.std.log Working set is defined as the memory that is in actual use and doesn’t have any garbage. Working set of the Manager can be found as above, immediately after a “Full GC”
2006/02/22 23:22:51 | Memory Status: 765.6 MB Used, 1,014.0 MB Max 2006/02/22 23:22:52 | [Full GC 2006/02/22 23:22:58 | 797362K->471587K(1038336K), 5.9847261 secs]
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28
How to choose heap size?
Recommendation for heap size is 2 x working set Too small • Frequent full GC • Bad performance • Manager could die on OutOfMemoryError
Too large • Peak performance is good, but… • Full GC takes long time to finish • Manager could get killed by Wrapper for being hung for a long time
Adjust heap size through Management Console, or by running ‘managersetup’
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29
Out of memory
Server will restart on out of memory errors Check Logfu Check “CapsManager” from server.status.log to check overall memory utilization by Data Monitors, channels, Active Lists etc. If you see a spike Multiple memory intensive tasks at the same time? Increase heap size
Memory leak Memory usage keeps growing Increasing heap size only delays the problem Memory leak is hard to track down Contact support
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Log analysis
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31
Exceptions
Details of application errors • Java construct encapsulates some failures
– Coding errors – Transient bugs
• A full stack trace is included – Shows where in the code the error occurred
• Not all exceptions are equal – Misclassified or not significant impact – Sometimes related to content
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32
Where is the cache?
SERVER.STATUS.LOG • Agent Statuses
AgentStatuses="[|||Name|ID|Reported|Agent Time|Received by Agent Count|Received by Agent EPS|Post-Filter Count|Post-Filter EPS|Post-Aggregation Count|Post-Aggregation EPS|Estimated Cache Size|Sent To Manager Count|Sent To Manager EPS|Failed Connection Attempts, archiver|3lSvf5BEBABCBfCSYubv3rw==|05/08 11:23:54|05/08 11:23:54|0|0.0|0|0.0|0|0.0|0|0|0.0|0, Syslog|3t5MEiRcBABDmQv3t56sCJQ==|05/08 11:23:31|05/08 11:23:31|4000|62.5|3643|56.921875|3643|56.921875|120,000|3650|57.03125|807761 Total|-|-|-|35,434|582.8|33,328|549.8|33,086|546.0|120,100|88,898|1,476.2|1,335,569]“
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33
Delayed events
SERVER.LOG • default.com.arcsight.util.TimedRingBuffer][increment] Throwing out increment X, increment time = X,
acceptable range X - X (discarded=X)
Active channel • Gaps between Manager Receipt Time and End Time and Agent Receipt Time
– Device Receipt Time – Connector Receipt Time (a.k.a. Agent Receipt Time) – Manager Receipt Time – Start Time – End Time
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 34
Database connectivity
SERVER.STD.LOG • Connectivity Issues
– SERVER.STD.LOG • SUBSYSTEM STATUS CHANGED
• Persistence Rate – should take less than 100ms • INFO | jvm 2 | 2009/05/07 20:41:58 | (02-Pre-SecurityEventPersistor330) Persisted 100 events
in 32 ms. • INFO | jvm 1 | 2009/05/08 11:20:53 | (02-Pre-SecurityEventPersistor1) Persisted 100 events
in 3,698 ms
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35
Manager busy
SERVER.STD.LOG • Manager stops accepting events
• INFO | jvm 1 | 2005/04/04 00:42:26 | WARNING: '1' agent requests REJECTED because the limit of '64' agent threads was exceeded.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thread Dumps
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 37
Insertion issues require Thread Dumps
SeededJsse Listener threads
Start-of-flow Threads
(Normalization)
Pre-persistor Threads
Post-persistor Threads
(rules engine)
XCPUDMPC Threads (Data
Monitors)
Bytes read from Socket and converted to Java SecurityEvent Objects
Active channel queries Report queries Trend queries
Different resources retrieving event data from the database
Event insertion flow – events are being inserted into the database
Event insertions
Event retrievals
Database
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38
Don’t restart or reboot before collecting!
A Java snaphsot
Why Thread Dumps • Stack trace for each thread in the VM • Many different threads • Bottleneck area usually identifiable
– Session Waits or DB Sessions needed to correlate database activity
Generating Thread Dumps • Manage.jsp | NGServer | generateThreadDumps Invoke • <ARCSIGHT_HOME>/bin/arcsight managerthreaddump
Formatting Thread Dumps • <ARCSIGHT_HOME>/bin/arcsight threaddumps > threaddumps.html
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39
Servlet engine
SeededJsseListener • Read bytes from network sockets • Convert read bytes to Java Objects “Security Event Batch” • Place event batches into queue for Flow 1
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 40
Flow 1: Start
Start-of-flow • Vulnerability Scanner Reports • Place event batches into queue for Flow 2
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 41
Flow 2: Pre-persistor
Pre-SecurityEventPersistor • Remove event from batch from queue • Initialize and normalize event fields • Write to database • Put event batch in to queue for Flow 3
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 42
Flow 3: Post-persistor
Post-SecurityEventPersistor • Remove event from batch from queue • Evaluate events against rules • Generate Correlation events • Put event batch in to queue for Dashboards
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 43
Content: Dashboards
XCPUDMPC-Thread • Remove event from batch from queue • Evaluate events against Data Monitors • Generate Correlation events • Put event batch in to queue for garbage collection
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Logfu
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 45
Logfu is not an officially supported tool
Why Logfu?
Discerning patterns • Examines server.log, server.std.log, and server.status.log • Syntax
– ArcSight logfu –m –noplot • -m is for “manager” • -noplot skipps plotting on graph
• Outputs logfu.html to logs/default/Logfu_<date>/ • Interesting data points
– “Famous Last Words” – Why did it die – “Exception Groups” – Quickly identify repeating exceptions – “Memory” – Identify growth in memory consumption – “Event Insertion” – Is the database/disk able to keep up
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 46
Memory patterns
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 47
Event throughput patterns
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 48
Shutdown patterns
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 49
Event insertion patterns
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 50
Plot time per batch to identify network lag
Use with Connectors too
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Advanced Management Interface a.k.a. manage.jsp
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 52
https://<HOST_NAME>:8443/arcsight/web/manage.jsp
Status on demand
Interesting Mbeans • Agent State Tracker
– Specific and overall EPS for connectors • SessionManager
– How many users are logged in • SubsystemStatus Tracker
– Whiner • ActiveList Monitoring
– Memory consumption • Channels
– How many – Validating the SQL
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 53
Groups and filters
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 54
Mbean: RulesEngine
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 55
Mbean: AgentStateTracker
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Security for the new reality