View
13
Download
0
Category
Preview:
Citation preview
Managing Billions of logs Every Day Fast In, Smart Out
White Paper
White Paper
Managing Billions of Logs Every Day
Managing billions of logs every day – Fast In, Smart Out The most practical approach to managing high-volume log data is solutions platforms outside of the
traditional database architecture. IT operations can comprise hundreds or thousands of log sources
ranging from network devices to critical applications to operating systems. Each of these generates log
messages of bewildering variety, in non-standard formats and staggering volumes. This presents a
significant and complex technical challenge to organizations and their vendor of security information and
event management (SIEM) solutions. These systems must capture all of the data, analyze it for critical
alerts in real-time and then securely archive the unaltered logs to meet legal chain-of-custody
requirements while also indexing it for subsequent search and reporting - in present formats and those
not yet defined.
These requirements must be met in a cost effective solution that does not create a data storage explosion.
EventTracker reconciles all of these disparate objectives in a scalable, efficient software application
delivered as a virtual appliance or on physical servers. This white paper describes the unique advantages
of EventTracker’s architecture.
EventTracker describes our design criteria as “Fast In, Smart Out”. Unstructured data is efficiently
received, analyzed against configurable rules for alerting and correlation, and archived as flat files without
the need for a relational database or further pre-processing, manipulation or normalization. This approach
allows for very fast input of log data including those in new or custom formats. EventTracker then creates
a sparse matrix metadata index associated with each log archive providing dramatic performance
enhancement during extraction and display of data for log searches and reporting – hence “Fast In, Smart
Out”.
For security, the archive files are compressed on the file system and a SHA-1 checksum is generated and
striped over each archive file. This provides exceptional storage efficiencies where the indices provide
instant read efficiencies. The architecture utilizes a write-once-read-many implementation ensuring that
once data is committed to archive, it cannot be altered without detection.
EventTracker also includes the industry’s largest pre-defined log knowledge libraries which provide the
automated interpretation of log data represented in easy to understand language for alerting, search,
dashboards and reporting.
Challenges SIEM implementations address distinct challenges in the enterprise network including log volume, variable
log formats, secure data retention, and widely disparate use cases including real-time alerting, access,
correlation, analysis, reporting, forensics, and long-term secure data management.
White Paper
Managing Billions of Logs Every Day
Large Volumes of Unstructured Log Data
SIEM - Log
Management
• No Logging Standards
• Millions / Billions / Trillions
• Thousands of Vendor Log Formats
Large Volumes of Unstructured Log Data
Log Volume Event and audit log messages are generated by thousands of log sources, including computing platforms
and applications, networking, storage and security devices. The volume of log messages often varies with
time of day, peaking at shift changes or as critical applications launch across the enterprise. EventTracker
manages the inbound volume, and understands each log message it receives in real-time.
Syslog and SNMP (simple network management protocol) messages from UNIX- or Linux-based systems,
firewalls, routers and switches generally push messages using protocols such as UDP or TCP to export log
data to third party receivers (the SIEM). Microsoft Windows Servers and Workstations, and many other
systems write audit events to local disk. These are collected by pulling [i.e. polling] or pushing them to
receivers. Also, some Windows audit events are written to text or other formats outside of the EVT/EVTX
log files. EventTracker supports polling with proper credentials, or (preferably) transmitted in real-time
using the EventTracker Windows Agent.
EventTracker includes complete facilities to install, configure, upgrade and uninstall EventTracker agents
from the management console. An MSI package is also available for distribution via Microsoft SMS, KACE
or similar software distribution tools. [See “Log Collection” below for more information on EventTracker
Agents.]
Log Collection The most verbose IT logs generated in a typical enterprise include security, network, infrastructure and
application sources which generate hundreds of millions, (even billions+) of event logs each hour/day.
This enormous body of continuously generated information is the basis for providing and assessing IT
security, and is instrumental in demonstrating regulatory compliance using a SIEM.
White Paper
Managing Billions of Logs Every Day
The lack of standardization in the industry yields a proliferation of formats and transmission methods
ranging from the traditional UNIX/Linux/Network syslogs and simple network management protocol
(SNMP) traps to open database connectivity (ODBC) and proprietary interfaces such as Check Point®
Software’s OPSEC API, VMware API, the aforementioned Windows EVT/EVTX formats, etc.
EventTracker can operate “agent-optional” as some log sources do not natively send their logs off-
platform/host in a standard manner (e.g., Windows) and a MS gold-logo certified agent is available. The
agent is persisted as a low-overhead, silent service on monitored Windows systems, providing a wealth of
advanced features.
Batch Capture Often, audit log data is generated as text or xml files containing many individual event messages (e.g.,
Bluecoat proxy devices, web servers, Java application data, or Apache log4j, etc.). EventTracker supports
the bulk load of this data directly into the EventVault archive. In this process, EventTracker will invoke a
third-party plug-in to process the raw data before it is archived. For example, Apache logs may be
processed by awstats, a popular statistics package that generates a variety of web activity reports in html.
These reports can be displayed in the various EventTracker Reports Consoles for a unified view without
having to first process them in EventTracker.
Data Decisions Given today’s data volumes, a significant technical challenge is the comprehensive processing of every log
as it is received. Some SIEM applications attempt to manage this by first normalizing log data. Systems
that normalize each inbound message to pre-determined [meta-] data values (i.e. user = “x”) by
performing lookups and data replacement may compromise legal or enforcement processes downstream
as the original log information (which might later be required for audit reporting or forensics/search
operations) is discarded and those systems retain only a subset of message contents.
EventTracker does not normalize log messages on input, but processes and retains them in their original
format. This approach sustains high data input rates and does not discard information to populate a pre-
defined RDBMS schema. Processing includes receipt, parsing, auto-identification, categorization, alerting,
indexing and archiving.
Secure Log Storage Log data archives must be secure as required by PCI-DSS, FISMA, HIPAA and other [internal] security and
third-party compliance standards. The EventTracker EventVault is a file-based log archive which stripes a
SHA-1 checksum on each .cab file, rendering them tamper evident. This checksum is re-generated and
compared each time an archive is accessed and regular file integrity interval checks may also be scheduled.
EventTracker server hardening guidelines provide detailed recommendations on access control
restrictions and encryption configured on the OS to lock down access and protect your log data over time.
White Paper
Managing Billions of Logs Every Day
Compatibility EventTracker’s open, flexible log management framework can receive and process any log from any
source. EventTracker is a MS gold logo certified software application available as a virtual appliance or
instantiated on physical Windows Servers or in the “cloud”. Log archives are standard MS .cab files on the
file system. There is no requirement for costly SQL or Oracle RDBMS or database administrators.
EventTracker Architecture The EventTracker baseline architecture as shown in the diagram below provides two primary methods of
data collection, real-time or direct log file transfer (batch). Transfers via syslog (TCP or UDP), SNMP v1 or
v2, or via EventTracker Agents (available for Windows and Solaris BSM) are fully supported. Note that the
optional EventTracker Windows agent is also able to gather log data from Checkpoint devices using the
OPSEC LEA interface and from VMware via its XML API, MS SQL trc, flat files, CSV, W3C, text and XML
formats are also supported in real-time along with any selected events from the Windows event log. Log
file transfers are also supported via ftp, sftp or scp and Apache log4j.
White Paper
Managing Billions of Logs Every Day
Scalability The EventTracker implementation is as a set of distributed software modules that communicate via IP
when operating with the multi-server Collection Master / Collection Point architecture (see diagram
below). Up to 20 Virtual Collection Points which include the EventTracker Receiver, Processer and Archiver
logic, may be instantiated, to monitor specific ports (eg 514, 14505, etc.) and optionally write to a unique
disks or spindles for multi-threaded processing. This approach provides superb scalability on generic
server class hardware.
Note: Commodity servers running EventTracker’s Collection Master / Collection Point can fully process billions of inbound logs
with peak loads configurable to 100,000+ events per second. Multiple server implementations based on this architecture can
accommodate even the highest volume operations.
White Paper
Managing Billions of Logs Every Day
Disk Utilization Log data is voluminous and much of it is of minimal long-term value. It is sometimes hard to predict
accurately which data will be useful. Consequently, careful organizations opt to retain all of it. This can
very easily turn into a storage and disk utilization nightmare. Efficient storage of log data over variable
retention periods is a critical evaluation factor. Bringing 1K bytes of raw data into a traditional relational
database results in 12K to 15K of new storage – caused by construction of tables and other overhead.
In contrast EventTracker provides significant storage efficiencies due to a very low data storage factor of
0.12 – 0.28 which includes metadata (indices) as well as the compressed flat file log data. Zero
maintenance is required on a daily basis. Data received in real-time is processed for alerts against rules
and also by the correlation engine and behavior modules. Data arriving via file transfer is processed per
rules (which can include processing via a third party plug-in). All data is compressed, archived and signed
with a SHA-1 checksum within the EventVault. An indexing process develops metadata for the newly
created archive which is stored as an XML reference and associated with the target archived file.
These include USB monitoring, selectable or scheduled/compressed event transfer, traffic caching and
encryption, system monitoring and managing of application log files outside the Windows EVT/EVTX event
logs. There are strong advantages to installing EventTracker agents including source/type filtering,
caching, access to the underlying platform for security, real-time caching and protocol stack, access to log
files in other locations etc. Note that EventTracker includes the ability to centrally deploy, configure and
remove remote agents. Alternatively, agents are also available as an MSI package for distribution via other
methods. However, there are cases when their use is not possible or desirable. In such cases, the platform
can be polled periodically for new log entries over the network. Agents are a must in cases where the
platform is closed, binary and/or does not send the log data off-platform (e.g., IBM i-Series or Solaris under
BSM).
A receiver process, which is part of EventTracker, listens on configurable ports for specific protocols (e.g.,
syslog over UDP or TCP, SNMP, real-time streams from EventTracker Windows Agent or Solaris BSM agent
etc. Data is written to cache files on disk as soon as it is received. Once 50Mb of data is received or sixty
minutes have elapsed, the cache is compressed and indexed in preparation for archival.
Processing Log data received in real-time are processed through up to three rules-based functions. First, the log
message is matched with alerting and behavior rules. A positive match triggers the configured,
notifications and/or remedial actions. Notification methods include e-mail, sms text, CTI, pager or forward
as SNMP trap or syslog message. Remedial actions can be triggered as a local script on the master console
or at the optional agent host. Second, the log message is processed by the correlation engine, which
maintains a cache of events. A positive match results in the generation of an EventTracker composite log
message which can combine elements from the source logs in the correlation rule. The new log message
is used to trigger alerts as described earlier. Finally, the contents of the log message are reviewed by the
White Paper
Managing Billions of Logs Every Day
Behavior Analysis module with its rule set. Any new or out-of-ordinary condition results in the generation
of an appropriate log message which may be configured to trigger the aforementioned alert conditions.
Data Management EventTracker’s scalable architecture captures the original raw log message in its native form. This is critical
when faced with chain-of-custody questions for Human Resources, civil or criminal issues. While reports
and dashboards may be configured to show only relevant pieces of the log messages, the complete original
log/event is always accessible. Storage compresses flat files (standard.cab) on the Windows file system.
There is no lock-in to proprietary formats. Lastly, the use of standard platforms and formats maximizes
the interoperability within the enterprise architecture. Long term storage can be on-line, near-line or off-
line, using standard Windows file management and back-up capabilities.
Reporting Hundreds of pre-defined reports are available within EventTracker grouped under Security, Compliance
(includes SANS Consensus Audit Guidelines - IT controls) and Operations bundles. Reports may be
generated in PDF, Word, HTML or XLS/XLSX formats. Users interact with a simple point-and-click interface
to specify reporting parameters. These are used to consult the index data and determine the relevant
archive files. This approach can yield can up to 10x improvement in performance as compared to the brute
force approach which would be needed if indexing was absent. The technique is particularly effective
when reporting on exceptions (needle in the haystack). EventTracker also includes the powerful concept
of FLEX reports where the user has complete control over report design. A simple point-and-click interface
allows the user to define log message filters, parsing rules and the output format. Results are usually
generated in the Excel format which allows for further post-processing. This technique is especially
powerful in generating reports for hitherto unknown log formats. Reports can be very quickly created and
scheduled for regular generation and delivery.
Summary The demands of the dynamic enterprise placed on a comprehensive SIEM system can be too great for a
system based on the traditional RDBMS. With its highly scalable distributed file architecture, EventTracker
can meet these demands at a lower total cost of ownership than other systems which lack EventTracker’s
“Capacity on Demand” architecture.
When examining SIEM products, administrators should consider the following:
Is RDBMS Licensing required?
Will the SIEM “fill up” and will the input capacity and storage scale without additional
appliance/hardware purchases?
Is the application logic separated from the storage back-end?
White Paper
Managing Billions of Logs Every Day
What precise disk storage will be required to support your volumes and retention requirements
over the next 2-3 years? What is included in the vendor’s “Data Explosion” numbers, which are
often times provided as EPS (events per second)? Do the projections include all storage
overhead, or just the data in the RDBMS?
Does the system continue to collect data during backup, defragmentation and other management
scenarios?
Are agents optional? How many EPS can they handle? What happens to dropped messages?
Is TCP supported for syslog messages to ensure delivery/receipt?
Does the system utilize proprietary protocols or formats?
Does the system capture and retain log data from unrecognized network devices?
What data is stored? – All fields of the data captured or just a standard subset?
Is auto-discovery inherent to minimize pre-configuration effort?
How many unique log are fully supported out-of-the-box?
Does the architecture affordable support distributed WANS, V-LANS? Does each site need a
physical collector appliance?
About EventTracker EventTracker’s advanced security solutions protect enterprises and small businesses from data breaches
and insider fraud, and streamline regulatory compliance. The company’s EventTracker platform comprises
SIEM, vulnerability scanning, intrusion detection, behavior analytics, a honeynet deception network and
other defense in-depth capabilities within a single management platform. The company complements its
state-of-the-art technology with 24/7 managed services from its global security operations center (SOC)
to ensure its customers achieve desired outcomes—safer networks, better endpoint security, earlier
detection of intrusion, and relevant and specific threat intelligence. The company serves the retail,
hospitality, healthcare, legal, banking and financial services, utilities and government sectors.
EventTracker is a division of Netsurion, a leader in remotely-managed IT security services that protect
multi-location businesses’ information, payment systems and on-premise public and private Wi-Fi
networks. www.eventtracker.com.
Recommended