SNYPR 6.2 CU4 Architecture Guide · 2019-11-13 · TableofContents Introduction 4 NodesthatIntegratewithHadoop 6 DeploymentAlternatives 9 DedicatedSNYPRDeployment 9 SNYPR DeploymentwithExistingHadoopInfrastructure

SNYPR 6.2 CU4

Architecture Guide

Date Published: 11/13/2019

Securonix Proprietary Statement

This material constitutes proprietary and trade secret information of Securonix, and shall not be disclosed to any

third party, nor used by the recipient except under the terms and conditions prescribed by Securonix.

The trademarks, service marks, and logos of Securonix and others used herein are the property of Securonix or

their respective owners.

Securonix Copyright Statement

This material is also protected by Federal Copyright Law and is not to be copied or reproduced in any form, using

any medium, without the prior written authorization of Securonix.

However, Securonix allows the printing of the Adobe Acrobat PDF files for the purposes of client training and

reference.

Information in this document is subject to change without notice. The software described in this document is

furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in

accordance with the terms of those agreements. Nothing herein should be construed as constituting an additional

warranty. Securonix shall not be liable for technical or editorial errors or omissions contained herein. No part of this

publication may be reproduced, stored in a retrieval system, or transmitted in any form or any means electronic or

mechanical, including photocopying and recording for any purpose other than the purchaser's internal use without

the written permission of Securonix.

Copyright © 2019 Securonix. All rights reserved.

Contact Information

Securonix

14665 Midway Rd. Ste. 100

Addison, TX 75001

(855) 732-6649

SNYPR Architecture Guide 2

Table of Contents

Introduction 4

Nodes that Integrate with Hadoop 6

Deployment Alternatives 9Dedicated SNYPR Deployment 9SNYPR Deployment with Existing Hadoop Infrastructure 11

Hadoop Components 13

Search Deployment Options 15Search Embedded 15Search Dedicated 16Search Index Storage Estimates 17

Data Ingestion 20Phase 1: Collect and Publish 20Phase 2: Enrichment 21Phase 3: Processing 22Indexing Incoming Events 23Deployment Assumptions 25High Availability 31Reference Server Specifications 37

SNYPR Cloud Deployment 54

Considerations 55Amazon EC2 55Microsoft Azure 56Network 57Virtual Infrastructure 60

Recommendations 62Hadoop Cluster Tuning Recommendations 62Network Tuning Recommendations 74

Google Cloud 83Deployment Architecture 84

Spark Jobs Configuration for Kerberized Kafka 85Disaster Recovery Alternatives 86


Introduction

IntroductionSNYPR is a big data security analytics platform built on Hadoop that utilizesSecuronix machine learning- based anomaly detection techniques and threat modelsto detect sophisticated cyber and insider attacks. SNYPR uses Hadoop both as itsdistributed security analytics engine and long-term data retention engine. Hadoopnodes can be added as needed, allowing the solution to scale horizontally to supporthundreds of thousands of events per second (EPS).

Featuresl Supports a rich variety of security data including security event logs, user identitydata, access privileges, threat intelligence, asset metadata, and netflow data.

l Normalizes, indexes, and correlates security event logs, network flows, and applic-ation transactions.

l Utilizes machine learning-based anomaly detection techniques, including beha-vior profiling, peer group analytics, pattern analysis, and event rarity to detectadvanced threats.

l Provides out-of-the-box threat and risk models for detection and prioritization ofinsider threat, cyber threat, and fraud.

l Risk-ranks entities involved in threats to enable an entity-centric (user or devices)approach to mitigating threats.

l Provides Spotter, a search feature with normalized search syntax that enablesinvestigators to investigate today’s threats and track advanced persistent threatsover long periods of time, with all data available at all times.

l Provides the Investigation Workbench to detect links across disparate data sets toenable quick investigations and hunting for cyber threats.

AudienceThe guide is intended for system administrators, system integrators, and deploymentteams who need to determine the deployment options in a Hadoop cluster.

Additional ResourcesIf you require additional information, the following SNYPR documents are available:


Introduction

l Installation Guide: For system administrators, system integrators, and deploymentteams who need to install the application.

l Administration Guide: For system administrators who are responsible for ongoingoperations and management, business managers, and other users in a super-visory role who need information about how to use SNYPR to grant employeesand partners access to applications, check for policy violations, and managecases.

l Data Integration Guide: For data integrators who need to import activity and enrich-ment datasources to support existing and custom use cases.

ll Content Development Guide: For content developers who need to use existingcontent and develop custom use cases to detect the threats to your organization.

l User Guide: For information security professionals, security analysts who need todetect and manage threats, and risk and compliance officers, and IT specialistswho need to use the reporting capabilities of SNYPR to monitor and remediatecompliance.


Nodes that Integrate with Hadoop

Nodes that Integrate with HadoopThe SNYPR architecture includes the following nodes that integrate with the Hadoopservices:

SNYPRApplicationServer

ConsoleUserInterface,configurationdb, Redis

These are edge nodes in a Hadoop cluster that are used forthe SNYPR user interface and the configuration repositoryfor all components used by the solution. Each ConsoleNode performs the following tasks:

l Provide visualizations formonitoring events, threat management dashboards,investigations and incidentresponse

l Build custom dashboards withvisualizations for viewing violation and event data

l Configure all ingestion jobs- user identities, access privileges, threat intelligence,security eventsand others

l Administration interface for applicationsupport, personnel and administrators

l Configure all policies andanalytics, including behavior-based anomaly detection,peer-based analytics,threat modeling and risk analytics

h



SNYPREYE

SNYPREYEInterface,configurationdb

SNYPR Eye Server is a SNYPR monitoring and alertingserver that is used for the configuration and operationalhealth monitoring of all SNYPR services including the allthe servers in the hadoop cluster, the processes on theSNYPR Console, the SNYPR Spark Streamingapplications running in the YARN cluster, including theperformance of the data ingestion of all resources, theperformance and health of the SNYPR Search processes.The SNYPR Eye solution installs and manages SNYPR-EYE agents on the servers in the environment for localmonitoring.

SNYPRRemoteIngestionNode

SNYPRRemoteIngestionNode

SNYPR Remote Ingestion Nodes: These nodes are Edgenodes in a Hadoop cluster that are used to ingest securityevent log data into the environment with the Securonixconnectors.

Each SNYPR Ingestion node performs the following tasks:

l Import Events from log sources

l Publish events to KafkaMessage Bus with batching, compression andencryption

l Accept incoming log files on syslog

l Cache In-transit messages

HadoopMaster

Hadoopclustermanagementservices

Hadoop Master Nodes: These are the master servers in theHadoop cluster.



Hadoop Compute / Storage Nodes: These are the mainnodes in a Hadoop cluster that are used to storecompressed data and process all the jobs associated withSNYPR.

Each SNYPR Compute/Storage node performs thefollowing tasks:

l Fetch data from the ingestion nodes.

l Perform all the jobs associated with SNYPRbased on the configuration stored in the Master node,including parsing, indexing,analytics, and storage.

l Store data with 90% compression in structuredJSON format.

l Pass processed data to SNYPR Search indexesthat are used by the SNYPR console for review by theend user.

HadoopKafkaBroker

KafkaBroker,dedicatedzookeeper

Kafka broker servers for in transit messages, configurationzookeeper servers dedicated to Kafka. These servers uselocal storage for in transit messages.


Deployment Alternatives

Deployment AlternativesThe SNYPR solution utilizes services in a Hadoop cluster. SNYPR provides thefollowing deployment options:

l SNYPR UEBA: SNYPR User and Entity Behavior Analytics (UEBA)This solutionprovides security analytics on security events. Events are stored only during theprocessing of the analytics.

l SNYPR Security Analytics Data: This solution provides security analytics onsecurity events. Events are stored for historical purposes and high-performancethreat hunting solution is provided for searching and visualization of events.

Dedicated SNYPRDeploymentThe Securonix SNYPR solution, shown in the diagram above, illustrates the servicesthat are used within SNYPR. In this deployment diagram, SNYPR is deployed with adedicated Security Analytics Data Lake. In this configuration, the Master nodesinclude the SNYPR Console and the Cloudera Manager service as well as otherservices like the HDFS Namenode, the YARN resource manager, Zookeeper, andother services that are used by the Hadoop cluster.

Based on the size of the deployment (events per second (EPS), analytics processed,retention period) and the features being supported (UEBA, Security AnalyticsPlatform, Data Lake).



The SNYPR Architecture will scale to meet the deployment requirements. For a smallUEBA deployment, a limited number of servers are deployed and a dedicatedSNYPR Search Server is used for index storage. The deployment include between 3and 6 Hadoop servers along with a dedicated SNYPR Search server. The SNYPRapplication and the Redis service are collocated with the Hadoop master services.

For a medium UEBA deployment, full high availability of all services is configured ofservers are deployed and two dedicated SNYPR Search Servers are used for indexstorage. The deployment include between 6 and 10 Hadoop servers along with twodedicated SNYPR Search servers and two dedicated SNYPR Application Servers.

SNYPRDeployment with Dedicated Security Analytics Data Lake – Medium –UEBA

For a large Security Analytics Data Lake deployment, full high availability of allservices is configured for all servers that are deployed and at least two dedicatedSNYPR Search Servers are used for index storage. The deployment includesbetween 6 and 10 Hadoop servers along three dedicated Kafka Brokers and twodedicated SNYPR Search servers.



SNYPRDeployment with Dedicated Security Analytics Data Lake – Large –Security Analytics Data Lake

SNYPR Deployment with Existing HadoopInfrastructureThe SNYPR solution shown in the following diagram (Figure 5) illustrates theservices for SNYPR that are added to an existing Hadoop cluster. The SNYPRApplication, SNYPR Search and SNYPR-EYE nodes are shown on the top and theexisting Hadoop cluster is shown in the box on the bottom. For the supported Hadoopdistributions, please see the SNYPR Installation Guide.



Logical SNYPRArchitecture – Existing Hadoop Cluster


Hadoop Components

Hadoop ComponentsSNYPR users a Hadoop cluster for processing all data. The core Hadoopcomponents include the following services:

l HDFS (Hadoop Distributed File System): Used to store security events and viol-ations. Data is stored in compressed parquet format.

l YARN (Yet Another Resource Negotiator): Provides resource management cap-abilities for jobs.

l Spark Streaming: Processing framework for live streaming data.

l HBase: Distributed no-SQL data store on HDFS to store the results of the ana-lytics.

l Kafka: Horizontally scalable message-bus used to manage the delivery of incom-ing security events.

l Impala (CDH) or Hive (HDP): Provides a SQL interface to the data stored inHDFS.

l ZooKeeper: Cluster management software to maintain configurations and syn-chronization services across nodes within a cluster.


Hadoop Components

Logical SNYPR Architecture – Dedicated Security Analytics DataLake


Search Deployment Options

Search Deployment OptionsSNYPR Search is a high-performance indexing and search solution that stores allactivity events in the environment that are access by the user interface.

SNYPR Search is deployed on an edge node in the Hadoop cluster. It requiresaccess to the SNYPR Console on the application server and the Kafka Brokers.These servers perform event indexing as well as storage of all violation data andrelated information used by the SNYPR user interface.

Embedded Dedicated

DescriptionLimited search server forsmall UEBA deployments.Limited to one search cell.

Dedicated search server forsmall UEBA or SecurityAnalytics Data Lakedeployments.

Indexing rate perSearch Cell (multiplecells are configured forincreased performance)

3k average EPS

5k peak EPS

Multiple Search Cellsare supported, each cellsupports 10k averageEPS / 15k peak EPS.Redundancy of searchindexes with replicationcan be configured forhigh availability andfaster searchperformance.

Retention 7 days 30 days or more.

Search EmbeddedAn embedded deployment of SNYPR Search is collocated with the SNYPRApplication and shares the resources on that server. The resources required for anembedded deployment of SNYPR Search are:



l 10 CPU

l 16 GB RAM

l 1 TB usable storage

An embedded SNYPR Search server is for small UEBA deployments and is limited to3,000 EPS average and 5k peak (EPS), and 7 days of retention. For deploymentscenarios with greater requirements, SNYPR Search Dedicated servers will be used.

SNYPR Search Embedded Mode

Search DedicatedThe SNYPR Search Dedicated deployment options are listed in the diagram below. ASNYPR Search Standard deployment uses a single dedicated server for indexingand searching. A SNYPR Search High Performance Cell includes separate serversfor indexing and searching. In a high-performance cell, the indexes are replicatedacross servers for redundancy and for isolating indexing workload from searchworkload.



SNYPR Search Dedicated

Search Index Storage Estimates

Embedded: 7 Days

Premium: 30Days

Premium: 30DayswithReplica

Days 7 30 30

Replicas

1 1 2

EPSAvgMessage Size

events / day GB/dayStorage(GB)

Storage(GB)

Storage(GB)

1,000 600 86,400,000 48 169 724 1,448

2,500 600 216,000,000 121 422 1,810 3,621

5,000 600 432,000,000 241 845 3,621 7,242



Embedded: 7 Days

Premium: 30Days


7,500 600 648,000,000 362 N/A 5,431 10,863

10,000

600 864,000,000 483 N/A 7,242 14,484

15,000

6001,296,000,000

724 N/A 10,863 21,726

20,000

6001,728,000,000

966 N/A 14,484 28,968

Premium: 60Days


Premium: 90Days


Days 60 60 90 90

Replicas

1 2 1 2

EPS

AvgMessageSize

events / dayGB/day

Storage(GB)

Storage(GB)

Storage(GB)

Storage(GB)



Premium: 60Days


Premium: 90Days


1,000

60086,400,000

48 1,448 2,897 2,173 4,345

2,500

600216,000,000

121 3,621 7,242 5,431 10,863

5,000

600432,000,000

241 7,242 14,484 10,863 21,726

7,500

600648,000,000

362 10,863 21,726 16,294 32,589

10,000

600864,000,000

483 14,484 28,968 21,726 43,452

15,000

6001,296,000,000

724 21,726 43,452 32,589 65,178

20,000

6001,728,000,000

966 28,968 57,936 43,452 86,904


Data Ingestion

Data IngestionSNYPR includes a data ingestion pipeline that includes normalization, contextenrichment, and correlation.

All event data in SNYPR is stored in a super enriched format. The Open EventFormat (OEF) is a self-describing format capable of supporting information fromheterogeneous data sources, while also adding enrichment data sets like useridentity data, threat intelligence feeds, asset information and others. This formatenables events to be contextually enriched at ingestion time. This ensures thathistorical changes to the enriched data are captured with the event at the time itoccurred. The original source event is always maintained in the OEF event. (Seehttps://openeventformat.org for details )The three phases of the SNYPR eventingestion pipeline are shown below.

Phase 1: Collect and PublishIn this phase, events are collected and a SNYPR publisher on the Remote IngestionNode (RIN) forwards the messages to the Kafka raw topic. There are multiple types ofSNYPR publishers, including the Ingestion node that uses the SNYPR ConnectorLibrary (Figure 3) and the syslog publisher that forwards messages directly to theKafka raw topic (Figure 5). The SNYPR publishers forward all events to the raw topicin the SNYPR transport format. This transport format adds metadata to the sourceevents to describe the event source and tag the events for processing in theenrichment job. The SNYPR publishers also support batching, compression, andencryption of the events that are published. This minimizes the bandwidth fortransmission to the centralized Kafka brokers.


https://openeventformat.org/

Data Ingestion

Single Pipeline

Phase 2: EnrichmentThe SNYPR Enrichment Spark Streaming job is responsible for event filtering,normalization, and context enrichment of the raw logs. During context enrichment,context is added to the incoming log data. This context enrichment includesenrichment from user HR sources, geolocation information, threat intelligence datum,and other lookup data like internal network maps and asset data. Additionally, the rawevent log message is stored in the original format as one of the columns in thenormalized schema.

Multiple Enrichment Pipelines


Data Ingestion

Phase 3: ProcessingThe third phase of the event ingestion pipeline is a parallel phase where multipleSpark streaming jobs subscribe to the enriched topic and perform indexing, storeenriched events in HDFS, and also analyze the events for threats.

The ingested data is stored for long-term storage in HDFS as parquet files and madeaccessible as Hive database tables that are partitioned by resource, year, and day.

The solution also indexes the data and stores it in SNYPR Search Solr collections.The solution creates additional index collections as the data size passes aconfigurable threshold, and maintains a control index for execution of parallel queriesacross the entire set of collections. The index files are maintained on the dedicatedSNYPR Search servers on local storage. This configuration provides parallel queryexecution across all the collections for deterministic response time for interactive useby the SNYPR user interface.

The log compliance data is stored in a read-only format that cannot be modified.SNYPR supports strong authentication, authorization, and encryption of the Hadoopinfrastructure. SNYPR also provides application layer encryption and masking thatcan be enabled selectively.

SNYPR uses Edge nodes for the user interface and for the SNYPR Search nodes. Allprocessing and long term storage of data is done within the Hadoop cluster. SNYPRprovides a feature called Spotter as an integral part of the solution. This featureprovides online searching and visualization of event data for the configured indexretention period.

The SNYPR Remote ingestion node includes the connectors that are used to ingestthe log data. The connectors leverage the specific log source APIs or files to accessthe log data. The incoming log messages are associated with a Job ID and aResource ID before they are submitted to Kafka, so that they can be processed by theSpark Streaming enrichment job. The connectors also perform offset management ofthe source of the log data to ensure that all the logs messages are obtained and, insome cases, pre-processing of the source data. An example of pre-processing the logdata is the Ironport syslog connector. This connector converts the multi-linemessages into a single line for publishing to Kafka.


Data Ingestion

Multiple Independent Pipelines

Indexing Incoming EventsSNYPR includes dedicated SNYPR Search servers. These servers are edge nodesin the Hadoop cluster and consume the enriched messages from the Kafka topic andperformed local indexing on the search servers. The search indexes are designed tooptimize the search performance by paralleling the searching across multiple sub-indexes or SNYPR Search collections. Each collection is further distributed across aconfigured number of shards to ensure distribution of the workload. Each Solr serverin the cluster is allocated CPU and memory to allow the SNYPR Search server toperform optimally.

The indexed events are ingested in real-time by the solution. SNYPR includes twoalternatives for indexing events.

l SNYPR Local Event Indexer (LEI) is an indexing process that reads enriched datafrom the Kafka topics and indexes events to the SNYPR Search servers.


Data Ingestion

l The SNYPR indexing job is a distributed Spark Streaming job that runs within theHadoop cluster. The compute and memory resources used for indexing arereserved capacity to ensure that events are ingested at the rate that they arrive tothe solution. This allows the indexing of ingested events to be paralleled acrossthe cluster to meet the deployment requirements of the solution.

An index control core collection is used to track the number of collections that thesolution is hosting. The solution maintains a maximum number of documents percollection threshold. The solution dynamically creates additional collections as moreevents are imported into the environment. The solution also provides the ability toduplicate redundant event data from the indexes during ingestion.

SearchingThe Spotter search interface allows users to search across all events. Interactive anddeterministic response time for searches is obtained by executing parallel searchesacross the collections. This approach ensures that the size of each index is optimizedand that the infrastructure can grow to support larger indexes without impacting theuser experience. The search results are incrementally returned to the user interfaceand displayed to the user as they arrive to ensure the responsiveness of the Spotterinterface.


Data Ingestion

Deployment AssumptionsDeploying a SNYPR environment requires many considerations for each of thecomponents of the solution.

For a standard deployment architecture, the following is recommended:

l Fast network access for the Hadoop cluster and edge nodes – 10 gigabyte Eth-ernet with jumbo frames configured on all switches and network interfaces (MTU-U=9000).

l All services running in a single data center

l A balanced SNYPR cluster with similar nodes (CPU, memory, storage, network)

l Securonix SNYPR using standard Securonix connectors for data ingestion. Theexact sources of event data are deployment specific

l The log event data available to the SNYPR environment (Ingestion Nodes), or fordirect connector access to log sources, based on the connector used

l Storage bandwidth recommended: 1000 IOPS per Hadoop and SNYPR Searchserver

l Purging Online Event data after retention period days to minimize required stor-age, unless there is a business need for long term historical searching. Violationand behavior data is not purged.

l Java 8 used by the cluster

For Hadoop tuning, See the section in this guide: Hadoop Cluster TuningRecommendations.

SNYPR Kafka Topic Partitioning Reference

KafkaTopics

10000 - 20000 EPS

Partitions Replication

tenantid-Raw 75 2

tenantid-Enriched 75 2

tenantid-Ops 1 2


Data Ingestion

KafkaTopics

10000 - 20000 EPS

Partitions Replication

tenantid-Tiertwo 75 2

tenantid-Control 1 2

tenantid-IndexerCount 1 2

tenantid-Violations 75 2

tenantid-User 1 2

tenantid-Count 1 2

tenantid-Preview 1 2

SNYPR Search Shard Allocation Reference

Solr Collections

10000 - 20000 EPS

Servers 9

Shards Replication

tenantid-activity 12 2

tenantid-violation 12 2

tenantid-whitelist 1 2

tenantid-entitymetadata 1 2

tenantid-tpi 1 2

tenantid-eeocontrolcore 1 2


Data Ingestion

Solr Collections

10000 - 20000 EPS

Servers 9

Shards Replication

tenantid-lookup 1 2

tenantid-ipmapping 1 2

tenantid-watchlist 1 2

tenantid-dailyviolationsummary

1 2

tenantid-users 1 2

tenantid-riskscorecard 1 2

tenantid-entityrelation 1 2

tenantid-access 1 2

SNYPR YARN Resource Allocation ReferenceThe SNYPR Spark applications are configured based on the ingestion rate that mustbe supported. An example of the Spark Application resources allocation is shown inthe table below. The table below is an example of the resource allocation for adeployment that supports 20,000 events per second with typical workload. There aremany variables affecting a deployment and the specific sizing recommended. ContactSecuronix for specific information.


Data Ingestion

SparkStreamingYARNResources

10,000 - 20,000 EPS

Driver Executors

vCPUMemory(GB)

NumberofExecutors

vCPUMemory(GB)

EventEnrichment

6 2 80 1 3

Event Ingestion 6 2 20 1 2

BehaviorAnalytics

1 2 10 1 4

Policy EngineIEE

1 2 40 1 2

Policy EngineAEE

1 2 10 1 3

Risk Generation 1 2 10 2 2

Traffic Analyzer 1 2 10 1 4

Behavior Profile 1 2 6 1 2

RoboticBehavior

1 2 10 1 3

Event Archiver 1 2 10 1 1

Phishing 1 2 1 1 4


Data Ingestion

SparkStreamingYARNResources

10,000 - 20,000 EPS

Driver Executors

vCPUMemory(GB)

NumberofExecutors

vCPUMemory(GB)

YARNResources

21 22 217 546

TotalYARNResources

238 568

SNYPR Extra Large DeploymentsThe sizing guidelines in this document are references for deployment of SNYPR. Thesolution will support much larger deployments based on the customer requirements.

For large deployments the search servers are dedicated servers rather than beingcollocated on the Compute/Storage nodes. This allows the search indexers to scaleas needed without impacting other services. This includes Solr and a dedicatedZookeeper configuration to avoid contention.

See Figure 10 for an example of the deployment with dedicated search servers.

There is no upper limit to the deployment size. The deployment architecture for extra-large deployments will be determined based on the specific deploymentrequirements. Contact Securonix for details.

The major variables that dictate the deployment recommendations include:

l Ingestion Rate (Events Per Second) of security event data

l Number of Users interacting with the application interactively

l The data retention requirements for online data


Data Ingestion

l The data retention requirements for log data

l The disaster recovery strategy


Data Ingestion

High AvailabilityThe SNYPR solution includes high availability of all the components of theinfrastructure. The Hadoop cluster is configured for high availability based on bestpractices deployment of Hadoop. This includes (but is not limited to) at a minimumhigh availability of the HDFS Namenodes, YARN resource Managers, at least 3zookeeper servers, and at least 3 kafka brokers. The high availability for the SNYPRservers that leverage the Hadoop cluster are described below.

SNYPR Application ServerHigh availability of the SNYPR Console is provided with an HA configuration of twonodes, with the user interface active on one of the two nodes during normal operation.MySQL replication, and a Redis cluster is configured as well as backup of the filesystem where the configuration data is stored (referred to as SECURONIX_HOME). Aload balancer is configured for access to the user interface.


Data Ingestion

SNYPR-EYE ServerHigh availability of the SNYPR-EYE Server is provided with an HA configuration oftwo nodes, with the user interface active on one of the two nodes during normaloperation. MySQL replication, as well as backup of the file system where theconfiguration data is stored (referred to as SNYPR-EYE_HOME) is configured onthese servers for high availability and a load balancer is configure for access to theuser interface.


Data Ingestion

SNYPR Search ServerHigh availability of the SNYPR Search Servers is configured for each SNYPR Searchcell in the deployment. The SNYPR Search cell includes a Local Event Indexer (LEI)as well as multiple search instances. A search cell with high availability will includeat least 2 SNYPRSNYPR Search servers. The LEI process is running on the primaryserver for indexing the incoming event data from the Enriched topic on Kafka. Asearch server provides a replica of all indexed data on another server. During a fail-over, the LEI is started on the second search server to enable active indexing on thatserver.


Data Ingestion

SNYPR Remote Ingestion NodesAt least two SNYPR Remote Ingestion nodes (RINs) are recommended for highavailability in each location that they are deployed. RINs are typically installed ineach major data center in close proximity to the logs that are being collected. Thedata collected by the RINs and forwarded to the kafka brokers is in compressedbatches that minimize the network transfer by roughly 90%. The RINs also encrypt thepayload and support SSL and mutual authentication as well as Kerberosauthentication.

The RINs collect data through two different methods, the push method and the pullmethod. The push method uses the embedded syslog server to collect and forwarddata to the kafka topics. The pull method uses the Securonix Connectors installed onthe RIN to connect to the APIs and gather the logs and forward them the to the Kafkatopic. High availability is provided on the kafka brokers by having 3 separate kafkabrokers and replication of the topics for availability.

A sticky load balancer is recommended for incoming traffic to the Remote Ingestionnode for incoming syslog traffic.


Data Ingestion

SNYPR Remote Ingestion Nodes

Hadoop Cluster Guidance for High AvailabilityThe Hadoop infrastructure services are used for high availability. The recommendedsettings are as follows:

l At least three Kafka brokers with ISR=3

l HDFS replication factor =3

l Kafka message retention = 2 days

l Kafka In Sync Replica (ISR=3)

l HDFS replication set to three

l HA Namenode

l HA Resource Manager


Data Ingestion

l At least three Zookeeper servers

l If security is required:l Kerberos authentication of all services in the Hadoop cluster

l Encryption of HDFS folders with HDFS encryption is also available for sens-itive resource data

l Authorization for protection of the access to data in the Hadoop cluster isrecommended with the native tools (Ranger for Hortonworks, Sentry forCloudera)

l The SNYPR Edge Nodes for Ingestion and the Console User interface interactwith the Hadoop services and support Kerberos

This is not a complete list. It is recommended that you follow the Hadoop bestpractices for deployment.

In addition to the storage required for the data, the compute and memory required forrunning the SNYPR jobs must be available in the Hadoop cluster. The SNYPRsolution includes several jobs that are running in the cluster. YARN is used toschedule the resources. The primary jobs that are part of SNYPR and the resourcesallocation are listed below.

The specific infrastructure required is based on the required peak ingestion rate.

Request specific deployment guidance from Securonix.


Data Ingestion

Reference Server SpecificationsThis section contains recommendations for the following topics:

l Hardware Specifications

l Server Mount Point

Hardware SpecificationsThe hardware specifications for the infrastructure are listed in the following table:

ConfigurationSNYPR-M1:Hadoop Master

SNYPR-M2:Hadoop Masterwith SNYPR

SNYPR-M3:Hadoop Masterwith SNYPR andKafka

Server Model Dell R640 Dell R640 Dell R640

CPU2 x Intel XeonGold 5120 2.2G,14C/28T

2 x Intel XeonGold 5120 2.2G,14C/28T


Memory256GB RDIMM,2666MT/s

256GB RDIMM,2666MT/s


Boot Storage2 x 1.6TB SSDSATA Mix Use12Gbps 512e

2 x 1.6TB SSDSATA Mix Use12Gbps 512e


Additional Storage4 x 2.4 TB 10KRPM SAS 12Gbps4Kn

6 x 2.4 TB 10KRPM SAS 12Gbps4Kn

8 x 2.4TB 10K RPMSAS 12Gbps 4Kn

Network 10GE 10GE 10GE

Power 2 x 1100W 2 x 1100W 2 x 1100W

Rack Units 1RU 1RU 1RU


Data Ingestion

ConfigurationSNYPR-C1:Standard DensityCompute/Storage

SNYPR-C2: HighDensityCompute/Storage

SNYPR-C3:MaximumDensityCompute/Storage

Server Model Dell R640 Dell R740xd Dell R740xd










AdditionalStorage



30 x 2.4TB 10KRPM SAS 12Gbps4Kn


Power 2 x 1100W 2 x 1100W 2 x 1100W

Rack Units 1RU 2 RU 2 RU

ConfigurationSNYPR-SEARCH1:Standard DensityCompute/Storage

SNYPR-SEARCH3:Maximum DensityCompute/Storage

Server Model Dell R640 Dell R740xd


Data Ingestion

ConfigurationSNYPR-SEARCH1:Standard DensityCompute/Storage

SNYPR-SEARCH3:Maximum DensityCompute/Storage

CPU2 x Intel Xeon Gold 51202.2G, 14C/28T

2 x Intel Xeon Gold 51202.2G, 14C/28T

Memory 256GB RDIMM, 2666MT/s 256GB RDIMM, 2666MT/s

Boot Storage2 x 1.6TB SSD SATAMix Use 12Gbps 512e

2 x 1.6TB SSD SATAMix Use 12Gbps 512e

Additional Storage10 x 2.4TB 10K RPM SAS12Gbps 4Kn

30 x 2.4TB 10K RPM SAS12Gbps 4Kn

Network 10GE 10GE

Power 2 x 1100W 2 x 1100W

Rack Units 1RU 2 RU

ConfigurationSNYPR-K3:Kafka Brokers

SNYPR-R1:RemoteIngestion Node

SNYPR-S3:SNYPR Console

Server Model Dell R640 Dell R640 Dell R640





64GB RDIMM,2666MT/s



Data Ingestion

ConfigurationSNYPR-K3:Kafka Brokers

SNYPR-R1:RemoteIngestion Node

SNYPR-S3:SNYPR Console




Additional Storage10 x 2.4 TB 10KRPM SAS 12Gbps4Kn


4 x 2.4 TB 10K RPMSAS 12Gbps 4Kn


Power 2 x 1100W 2 x 1100W 2 x 1100W

Rack Units 1RU 1RU 1RU

Alternate hardware configuration can be used, but equivalent specifications arerequired for CPU, memory, network bandwidth, storage capacity and bandwidth.

Server Mount PointThe storage mount point configuration for each of the servers is listed in the tablebelow:

Mount PointSNYPR-M1:HadoopMaster

SNYPR-M2:HadoopMaster withSNYPR

SNYPR-M3:HadoopMaster withSNYPR andKafka

Comments

/ 100 GB 100 GB 100 GB

RAID 1, (1.6TB mixed useSSD drives),xfs


Data Ingestion




Comments

/boot 2 GB 2 GB 2 GB


swap 10 GB 10 GB 10 GB


/zookeeper 200 TB 200 TB 200 TB


/var 800 GB 800 GB 800 GB


/dfs 200 GB 200 GB 200 GB


/securonix 4.2 TB 6.3 TB 8.4 TB

RAID 10, xfs, ifsyslog is usedlocally usehigher storageamount


Data Ingestion




Comments

/snyprsearch - - - RAID 6

/data1 - - 2.1 TBJBOD, xfs,noatime




MountPoint

SNYPR-C1:StandardDensityCompute/Storage

SNYPR-C2:High DensityCompute/Storage


Comments

/ 100 GB 100 GB 100 GB

RAID 1,(1.6 TBmixed useSSDdrives),xfs


Data Ingestion

MountPoint




Comments


RAID 1,(1.6 TBmixeduse SSDdrives),xfs



/zookeeper

200 TB 200 TB 200 TB


/var 800 GB 800 GB 800 GB



Data Ingestion

MountPoint




Comments

/dfs 200 GB 200 GB 200 GB


/securonix - - -

RAID 10,xfs, ifsyslog isusedlocally usehigherstorageamount

/snyprsearch

- - - RAID 6

/data1 2.1 TB 2.1 TB 2.1 TBJBOD,xfs,noatime




Data Ingestion

MountPoint




Comments









Data Ingestion

MountPoint




Comments

/data11 - 2.1 TB 2.1 TBJBOD,xfs,noatime








Data Ingestion

MountPoint




Comments









Data Ingestion

MountPoint




Comments

/data25 - - 2.1 TBJBOD,xfs,noatime







Data Ingestion

Mount PointSNYPR-SEARCH1:StandardDensityCompute/Storage

SNYPR-SEARCH3:MaximumDensityCompute/Storage

Comments

/ 100 GB 100 GB


/boot 2 GB 2 GB

RAID 1, (1.6TB mixeduse SSDdrives), xfs

swap 10 GB 10 GB


/zookeeper 200 TB 200 TB


/var 800 GB 800 GB


/dfs 200 GB 200 GB



Data Ingestion

Mount PointSNYPR-SEARCH1:StandardDensityCompute/Storage

SNYPR-SEARCH3:MaximumDensityCompute/Storage

Comments

/securonix - -

RAID 10, xfs,if syslog isused locallyuse higherstorageamount

/snyprsearch 17 TB 60 TB RAID 6

Mount PointSNYPR-K3:KafkaBrokers

SNYPR-R1:RemoteIngestionNode

SNYPR-S3:SNYPRConsole

Comments

/ 100 GB 100 GB 100 GB





Data Ingestion




Comments



/zookeeper 200 TB - -


/var 800 GB 1000 GB 1000 GB


/dfs 200 GB 200 GB 200 GB


/securonix - 4.2 TB 4.2 TB



Data Ingestion




Comments

/snyprsearch - - - RAID 6

/data1 2.1 TB - -JBOD, xfs,noatime











Data Ingestion

Alternatives for Limiting the Size of the InfrastructureThe recommended architecture assumes full functionality and full access to indexeddata and source data for the duration of the retention period.

Other factors may reduce the size of the recommended infrastructure such as areduction in the volume of log data or filtering of some log data to avoid storage ofunneeded events.

You can configure the Hadoop compute and storage nodes to use very dense storageper node. The following table shows an example configuration that is possible. Thisconfiguration includes dense storage.


SNYPR Cloud Deployment

SNYPR Cloud DeploymentSNYPR solution can be deployed in a cloud environment. Several considerationsmust be addressed when deploying SNYPR in a cloud including the following:

l Infrastructure selection: The infrastructure used should provide equivalentresources (CPU, memory, and storage capacity and bandwidth) to the physicalserver recommendations listed in this document.

l Deployment Architecture: SNYPR could be deployed exclusively in the cloud oras a hybrid cloud / on-site topology

l Network Access: The infrastructure must have access to the data (user, access,event log, TPI, etc.) that will be used. A Virtual Private Cloud may be required fortransmission of sensitive data.

Infrastructure SelectionSNYPR can be deployed in public or private cloud environments. Based on thedeployment requirements of the solution, the specific infrastructure used for eachcloud infrastructure should be selected to ensure that the appropriate resources areavailable. This includes selection of the appropriate virtual instance types to supportthe CPU, memory, storage and network bandwidth requirements of the solution.


Considerations

ConsiderationsThis section contains considerations for the following topics:

l Amazon EC2

l Microsoft Azure

l Network

l Virtual Infrastructure

Amazon EC2There are several Amazon EC2 Instance Types that are a good fit for deployingSecuronix. The M4 general purpose instances are recommended. These are definedby Amazon as:

"M4 instances are the latest generation of General-Purpose Instances. This familyprovides a balance of compute, memory, and network resources, and it is a goodchoice for many applications."

Featuresl 2.4 GHz Intel Xeon® E5-2676 v3 (Haswell) processors

l EBS-optimized by default at no additional cost

l Support for Enhanced Networking

l Balance of compute, memory, and network resources

HadoopMaster

Compute /Storage

KafkaSNYPRSearch

SNYPRSearch

AmazonEC2InstanceType

R5.4xlarge m4.16xlarge M5.2xlarge M5.4xlarge m4.16xlarge


Considerations

HadoopMaster

Compute /Storage

KafkaSNYPRSearch

SNYPRSearch

RAM (GB) 128 256 32 64 256

vCPU 16 64 8 16 64

Storage(GB, splitintomultipleEBSvolumes)

10,000 10,000 3,000 3,000 10,000

Amazon provides several alternatives for the instance types used, like the R3.8XL,and the D2.8XL, which are also good options. The storage chosen should provideadequate bandwidth to the volume used. This is the equivalent of 1000 IOPs perinstance to the selected storage type.

In addition to standard Amazon AWS EC2 instances, the guidance for deployingCloudera in Amazon Web Services is recommended. See the following link:https://www.cloudera.com/partners/solutions/amazon-web-services.html.

Microsoft AzureSeveral Azure Virtual Machine instance types (https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/) are a good fit for deploying Securonix. A G4(East US2), D15 v2 (East US2), or H16m (South Central US) instance type isrecommended.

The D sv2 instances are recommended. These are defined by Microsoft as:

“D11-15 v2 instances are based on the 2.4 GHz Intel Xeon® E5-2673 v3 (Haswell)processor, and can achieve 3.1 GHz with Intel Turbo Boost Technology 2.0. D11-15v2 are ideal for memory-intensive enterprise applications. D15 v2 instance is isolatedto hardware dedicated to a single customer.

For persistent storage, use the variant “Dsv2” VMs and purchase Premium Storageseparately.”


https://www.cloudera.com/partners/solutions/amazon-web-services.html

https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/

https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/

Considerations

HadoopAdmin

Compute /Storage

KafkaBroker

SNYPRSearch

SNYPRConsole

MicrosoftAzureInstanceType

E16 v3 D64 v3 E16 v3 D64 v3 E16 v3

RAM (GB) 128 256 128 256 128

vCPU 16 64 16 64 16

Storage(GB, splitinto multipleEBSvolumes)

3,000 10,000 5,000 10,000 5,000

Microsoft provides several alternatives for the storage for the instances used. Thestorage chosen should provide adequate bandwidth to the volume used. This is theequivalent of 1000 IOPs per instance to the selected storage type.

In addition to standard Azure instances, the following guidance for deployingCloudera in Microsoft Azure is recommended. See the link:https://www.cloudera.com/more/news-and-blogs/press-releases/2015-09-24-cloudera-enterprise-data-hub-edition-provides-enterprise-ready-hadoop-for-microsoft-azure.html.

NetworkA SNYPR deployment includes network transfer of several types of data into thesolution. This includes User, Access, TPI, Event Logs, Network Maps and other typesof data for a typical deployment. Due to the potential sensitivity of some of this data, avirtual private cloud may be required for each deployment.

In addition to the security considerations, the infrastructure will require sufficientnetwork bandwidth. The type of network traffic used by the solution is:


https://www.cloudera.com/more/news-and-blogs/press-releases/2015-09-24-cloudera-enterprise-data-hub-edition-provides-enterprise-ready-hadoop-for-microsoft-azure.html



Considerations

l End User Access to the Securonix User Interface

l Import of User, Access, and TPI data into the Master nodes

l Cluster communication and synchronization between the cluster nodes

l Import of event log data into the child nodes

The largest network traffic required is to transfer the event log data from the source tothe child nodes for import through the solutions connectors. The network traffic ratefrom the event logs sources to the child can be calculated by multiplying the eventsper second time the average message size.

5000 events per second (EPS) to two ingestion nodes in the deployment with anaverage message size of 500 byte, will require 2.5 MB per second, or roughly 25 Mbper second of bandwidth.

Network Bandwidth Characteristics by Tier

Tier DescriptionNetworkRequirements

Admin

This tier is where the end users losingto the user interface (traffic on port443). This tier also includes allmanagement services for the clusterand connects to the Compute /Storage / Search tier and MessagingTier for various services. incomingconnectionsWeb Services on port443, MySQL configuration for sparkjobs, Redis, zookeeper and otherhadoop cluster services. This tierhosts the management services thatthe agents on the Admin, Compute /Storage / Messaging tiers willcommunicate with.

10 GB ethernet, MTU =9000, centralized datacenter for Admin,Compute / Storage /Search, and MessagingTiers)


Considerations

Tier DescriptionNetworkRequirements

Compute/Storage/Search

Network traffic for to theseservers includes spark, Impala,HDFS, HBase services.Outbound traffic to services in theAdmin tier and the MessagingTier are also required.

10 GB ethernet, MTU= 9000, centralizeddata center for Admin,Compute / Storage /Search, andMessaging Tiers)

SNYPR Search

Network traffic for to these serversincludes SNYPR Search (Solr).Outbound traffic to services in theKafka Messaging Tier is required.

10 GB ethernet, MTU =9000, centralized datacenter for Admin,Compute / Storage /Search, and MessagingTiers)

Messaging

This tier includes incoming trafficto Kafka Brokers (SSL traffic toport 9093, and zookeeper trafficon port 2181).

10 GB ethernet, MTU= 9000, centralizeddata center for Admin,Compute / Storage /Search, andMessaging Tiers)

Collection

This server collects logs and providesa syslog server on port 514. Theconnectors on the server also collectlogs with native protocols. Theprimary network traffic from this tier isto the Admin tier on port 443 for webservices and the Kafka brokers in theMessaging Tier on port 9093 SSL)

Remote data center withoutbound networkaccess to the centralizeddata center.

If 10 gigabyte Ethernet is not available and gig-bit Ethernet is used in the deployment,then the performance of the deployment will be limited by the network performance.


Considerations

Network Bandwidth Requirements from RIN Collection Tier toMessaging TierThe table below displays the network bandwidth requirements from the RemoteIngestion Nodes (RINs) collection tier to the messaging tier (Kafka Brokers).

Average EPS 20,000 EPS

Number of RINs 1 RINS

average message size 600 bytes

Transferred to Kafka aftercompression (%)

30 %

Total Traffic to Kafka 36 Mbits/s

Traffic per RIN to Kafka(assuming equal distribution)

36 Mbits/s

Virtual InfrastructureDue to the high-performance requirements of the solution, physical servers ordedicated cloud instances are recommended. A virtual infrastructure can beconsidered for small deployments or non-production environments.

Considerations for virtual deploymentsl These are VMs that can be deployed as needed on the vSphere cluster, withoutover- subscription of either CPU or Memory resources. Configure CPUs alongphysical socket boundaries. According to vmware, one VM per NUMA node isadvisable.

l These nodes house the Cloudera Master services and serve as the gateway/edgedevice that connects the rest of the customer’s network to the Cloudera cluster.

l Care should also be taken to ensure automated movement of VMs is disabled.There should be no DRS or vMotion of VMs allowed in this deployment model.


Considerations

This is critical as VMs are tied to physical disks and movement of VMs within thecluster will result in data loss.

l Configure Distributed Resource Scheduler (DRS) rules so that there is strong neg-ative affinity between the master node VMs. This ensures that no two masternodes are provisioned or migrated to the same physical vSphere host.

l Key configuration parameter to consider is the MTU size to ensure that the sameMTU size being set at the physical switches, guest OS, ESXi VMNIC and thevswitch layers. This is relevant when enabling jumbo frames. (9000 MTU), whichis recommended for Hadoop environments.

l Set up virtual disks in “independent persistent” mode for optimal performance.Eager Zeroed Thick virtual disks provide the best performance.

l Each provisioned disk is mapped to one vSphere datastore (which in turn containsone VMDK or virtual disk)

l VMXNET3 NIC should be configured.

l Disable or minimize anonymous paging by setting vm.swappiness=0 or 1.

l VMs on the same physical host are affected by the same hardware failure. In orderto match the reliability of a physical deployment, replication of data across two vir-tual machines on the same host should be avoided.


Recommendations

RecommendationsThis section contains recommendations for the following:

l Hadoop Cluster Tuning

l Network Tuning

Hadoop Cluster Tuning RecommendationsThe tuning parameters in Table 1 describe the Hadoop tuning parameters for each ofthe services in the Hadoop cluster that optimize the Hadoop cluster performance forthe SNYPR workloads.

Hadoop Cluster Performance

Yarn

Allyarn containermemory

60 GB60GB

70 GB

Yarn

Yarn

AllJava Heap Sizeof NodeManager

850 MB 1 GB 850 MB

Yarn

Yarn

All

ZooKeeperClient TimeoutzkClientTimeout

1 min1min

1 min


Recommendations

Hbase

All

HBase: JavaHeap SizeThrift in Bytes:1 GB

1 GB 1 GB 1 GB

Hbase

Hbase

Cloudera

hbase.rpc.timeout

15 min10min

15 min

Hbase

Hbase

Cloudera

RegionServerLease Period

15 min10min

15 min

Hbase

Hbase

Cloudera

HBase ServiceAdvancedConfigurationSnippet (SafetyValve) forhbase-site.xml

name:hbase.ipc.warn.response.time value: 500

name:hbase.ipc.warn.response.time value: 500

HDFS


Recommendations

HDFS

AllJava Heap Sizeof DataNode inBytes

8 GB 8gb 8 GB

HDFS

HDFS

AllDataNodeBalancingBandwidth

1GB optional , 10MBdefault

1GBoptional ,10MBdefault

1GB optional , 10MBdefault

HDFS

HDFS

All

MaximumNumber ofTransferThreads

1600016000

16000

Impala

Impala

Cloudera

ImpalaDaemonMemory Limit

12 GB20gb

12 GB

Spark


Recommendations

Spark 2

AllJava Heap Sizeof HistoryServer in Bytes

512 MB512MB

512 MB

Hive

Hive All

Hive : SparkDriverMaximum JavaHeap Size : 256MB

256 MB256MB

256 MB

Hive

Hive All

Hive : SparkExecutorMemoryOverhead : 26MB

26 MB256MB

26 MB

Hive

Kafka

All

KAFKA:MaximumMessage Size -message_max_bytes - 10 MiB

10 MiB10MB

10 MiB

Kafka

Kafka

AllKafka Brokerlogging level

ERROR ERROR


Recommendations

Kafka

Kafka

All

ZooKeeperSessionTimeoutzookeeper.session.timeout.ms

6s 6s 6s

Kafka

Kafka

Allopen file limit ormaximum filedescriptors

100000100000

100000

Kafka

Kafka

All

Data RetentionHourslog.retention.hours

7 days7Days

7 days

Kafka


Recommendations

HDFS

All

BlocksWithCorruptReplicasMonitoringThresholds

warning:0.5,critical:1

warning:0.5,critical:1

warning:0.5, critical:1

HDFS

HDFS

AllReplicationFactor

2 2 2

HDFS

HDFS

AllMaximal BlockReplication

512 512 512

HDFS

Imapla

Cloudera

dump when outof memory

disableddisabled

disabled

Kafka

Spark

Cloudera


disableddisabled

disabled


Recommendations

Yarn

Cloudera

dump whenout of memory

disableddisabled

disabled

Zookeeper

Cloudera


disableddisabled

disabled

zookeeper-kafka

Cloudera

dump whenout of memory

disableddisabled

disabled

Hbase

Cloudera

Dump Heapwhen out ofmemory

disableddisabled

disabled

Kafka

All

MinimumNumber ofReplicas inISRmin.insync.replicas

1 1


Recommendations

HBASE

All

JavaConfigurationOptions forHBaseRegionServer

-XX:+UseParNewGC-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70-XX:+CMSParallelRemarkEnabled -XX:ParallelGCThreads=20 -XX:ConcGCThreads=15 -XX:+UnlockExperimentalVMOptions -XX:G1MixedGCLiveThresholdPercent=85 -XX:G1HeapWastePercent=2 -XX:InitiatingHeapOccupancyPercent=35-XX:+PrintReferenceGC -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M -verbose:gc -XX:+PrintGCDetails-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/hbase/gc.log

-XX:+UseParNewGC-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:ParallelGCThreads=20 -XX:ConcGCThreads=15 -XX:+UnlockExperimentalVMOptions -XX:G1MixedGCLiveThresholdPercent=85 -XX:G1HeapWastePercent=2 -XX:InitiatingHeapOccupancyPercent=35-XX:+PrintReferenceGC -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M-verbose:gc -XX:+PrintGCDetails-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/hbase/gc.log


Recommendations

Zookeeper

Cloudera

Jute MaxBuffer

90 MB90MB

90 MB

Zookeeper

AllJava Heap Sizeof ZookeeperServer in Bytes

6 GB 8 GB 6 GB

Zookeeper

AllMinimumSessionTimeout

80008000

8000

Zookeeper

AllMaximumSessionTimeout

9000090000

90000

Zookeeper

AllCanaryConnectionTimeout

20 seconds

20seconds

20 seconds

Zookeeper

AllTick TimetickTime

4000 4000 4000

Zookeeper

All

MaximumClientConnectionsmaxClientCnxns

80008000

8000


Recommendations

Zookeeper-Kafka

All

Zookeeper-kafka - JavaHeap Size ofZookeeperServer in Bytes

8 GB 8GB 8 GB

Zookeeper-Kafka

AllTick TimetickTime

40002000

4000

Zookeeper-kafka

Cloudera

Jute Max Buffer 50 MB50MB

50 MB

Zookeeper-kafka

AllMaxclientconnections

60006000

6000

Zookeeper-kafka

AllminSessionTimeout

4000 4000 4000


Recommendations

Zookeeper-kafka

AllmaxSessionTimeout

9000060000

90000

YARN

All

yarn.resourcemanager.am.max-retries,yarn.resourcemanager.am.max-attempts

20 20 20

Impala

AllImpaladaemonSafety valve

--enable_partitioned_aggregation=true --enable_partitioned_hash_join=true

Hadoop Cluster Log Configuration

Service Level Property

HBase ERROR Gateway Logging Threshold

HBase ERRORHBase REST Server LoggingThreshold

HDFS ERROR DataNode Logging Threshold

HDFS ERROR Failover Controller Logging Threshold

HDFS ERROR Gateway Logging Threshold

HDFS ERROR HttpFS Logging Threshold

HDFS ERROR JournalNode Logging Threshold


Recommendations


HDFS ERROR NFS Gateway Logging Threshold

HDFS ERRORNameNode Block State Change LoggingThreshold

HDFS ERROR NameNode Logging Threshold

HDFS ERROR SecondaryNameNode Logging Threshold

Hive ERROR Gateway Logging Threshold

Hive ERROR Hive Metastore Server Logging Threshold

Hive ERROR HiveServer2 Logging Threshold

Hive ERROR WebHCat Server Logging Threshold

Imapala ERRORImpala Catalog Server LoggingThreshold

Imapala ERROR Impala Daemon Logging Threshold

Imapala ERRORImpala Llama ApplicationMasterLogging Threshold

Imapala ERROR Impala StateStore Logging Threshold

Kafka ERROR Gateway Logging Threshold

Kafka ERROR Kafka Broker Logging Threshold

Kafka ERROR Kafka MirrorMaker Logging Threshold

Key Value Store ERROR Lily HBase Indexer Logging Threshold

Oozie ERROR Oozie Server Logging Threshold

Spark ERROR Shell Logging Threshold

Spark ERROR Gateway Logging Threshold

YARN ERROR History Server Logging Threshold

Gateway Logging Threshold


Recommendations


YARN ERROR JobHistory Server Logging Threshold

YARN ERROR NodeManager Logging Threshold

YARN ERROR ResourceManager Logging Threshold

Zookeeper ERROR Server Logging Threshold

Clouder Manager ERROR Activity Monitor Logging Threshold

Clouder Manager ERROR Alert Publisher Logging Threshold

Clouder Manager ERROR Event Server Logging Threshold

Clouder Manager ERROR Host Monitor Logging Threshold

Clouder Manager ERROR Service Monitor Logging Threshold

Network Tuning RecommendationsThe network configuration can have a dramatic performance impact on theenvironment.

The network tuning guidance in this section can be used to optimize the networkconfiguration for the linux servers in the environment.

Modify Network Kernel SettingEdit the Network Tuning Parameters in / etc / sysctl.conf file:

# vi /etc/sysctl.conf

Edit the following values:

# allow testing with buffers up to 128MB

net.core.rmem_max = 134217728


Recommendations

net.core.wmem_max = 134217728

# increase Linux autotuning TCP buffer limit to 64MB

net.ipv4.tcp_rmem = 4096 87380 67108864

net.ipv4.tcp_wmem = 4096 65536 67108864

# recommended default congestion control is htcp

net.ipv4.tcp_congestion_control=htcp

# recommended for hosts with jumbo frames enabled (only relevant

for systems with 10GB interfaces)

net.ipv4.tcp_mtu_probing=1

# recommended for CentOS7

net.core.default_qdisc = fq

In order for the above changes to take effect, reboot the server.

Increase the Transmit Queue LengthSet the txqueuelen permanently:

vi /etc/rc.local


Recommendations

Add the following (this is the interface where you will receive data):

/sbin/ifconfig em1 txqueuelen 10000

To validate:

# ifconfig em1 | grep txque

ether 90:b1:1c:1f:e6:1b txqueuelen 10000 (Ethernet)

Location Value

/etc/sysctl.conf vm.swappiness = 10

/etc/security/limits.conf hdfs - nofile 32768

/etc/security/limits.conf mapred - nofile 32768

/etc/security/limits.conf hbase - nofile 32768

/etc/security/limits.conf yarn - nofile 32768

/etc/security/limits.conf solr - nofile 32768

/etc/security/limits.conf sqoop2 - nofile 32768

/etc/security/limits.conf spark - nofile 32768

/etc/security/limits.conf hive - nofile 32768

/etc/security/limits.conf impala - nofile 32768

/etc/security/limits.conf hue - nofile 32768

/etc/security/limits.conf kafka - nofile 32768

/etc/security/limits.conf hdfs - nproc 32768

/etc/security/limits.conf mapred - nproc 32768

/etc/security/limits.conf hbase - nproc 32768

/etc/security/limits.conf yarn - nproc 32768


Recommendations

Location Value

/etc/security/limits.conf solr - nproc 32768

/etc/security/limits.conf sqoop2 - nproc 32768

/etc/security/limits.conf spark - nproc 32768

/etc/security/limits.conf hive - nproc 32768

/etc/security/limits.conf impala - nproc 32768

/etc/security/limits.conf hue - nproc 32768

/etc/security/limits.conf kafka - nproc 32768

/etc/security/limits.d/20-nproc.conf hdfs - nproc 32768

/etc/security/limits.d/20-nproc.conf mapred - nproc 32768

/etc/security/limits.d/20-nproc.conf hbase - nproc 32768

/etc/security/limits.d/20-nproc.conf yarn - nproc 32768

/etc/security/limits.d/20-nproc.conf solr - nproc 32768

/etc/security/limits.d/20-nproc.conf sqoop2 - nproc 32768

/etc/security/limits.d/20-nproc.conf spark - nproc 32768

/etc/security/limits.d/20-nproc.conf hive - nproc 32768

/etc/security/limits.d/20-nproc.conf impala - nproc 32768

/etc/security/limits.d/20-nproc.conf hue - nproc 32768

/etc/security/limits.d/20-nproc.conf kafka - nproc 32768

/sys/kernel/mm/transparent_hugepage/defrag

echo never

/sys/kernel/mm/transparent_hugepage/enabled

echo never

Proposed Configuration Tuning

jetty.conf for SNYPR Searchmake the default timeout 180K ms vsthe 50K


Recommendations

/etc/sysctl.conf

# --------------------------------------------------------------------

# The following allow the server to handle lots of connection requests

# --------------------------------------------------------------------

# Increase number of incoming connections that can queue up

# before dropping

net.core.somaxconn = 50000

# Handle SYN floods and large numbers of valid HTTPS connections

net.ipv4.tcp_max_syn_backlog = 30000

# Increase the length of the network device input queue

net.core.netdev_max_backlog = 20000

# Increase system file descriptor limit so we will (probably)

# never run out under lots of concurrent requests.

# (Per-process limit is set in /etc/security/limits.conf)

fs.file-max = 100000

# Widen the port range used for outgoing connections

net.ipv4.ip_local_port_range = 10000 65000

# If your servers talk UDP, also up these limits

net.ipv4.udp_rmem_min = 8192


Recommendations

/etc/sysctl.conf

net.ipv4.udp_wmem_min = 8192

# --------------------------------------------------------------------

# The following help the server efficiently pipe large amounts of data

# --------------------------------------------------------------------

# Disable source routing and redirects

net.ipv4.conf.all.send_redirects = 0

net.ipv4.conf.all.accept_redirects = 0

net.ipv4.conf.all.accept_source_route = 0

# Disable packet forwarding.

net.ipv4.ip_forward = 0

net.ipv6.conf.all.forwarding = 0

# Disable TCP slow start on idle connections

net.ipv4.tcp_slow_start_after_idle = 0

# Turn on the tcp_window_scaling

net.ipv4.tcp_window_scaling = 1

# Turn on the tcp_timestamps

net.ipv4.tcp_timestamps = 1

# Turn on the tcp_sack


Recommendations

/etc/sysctl.conf

net.ipv4.tcp_sack = 1

# Change Congestion Control (default: reno)

net.ipv4.tcp_congestion_control=htcp

# Increase Linux autotuning TCP buffer limits

# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE

# Don't set tcp_mem itself! Let the kernel scale it based on RAM.

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

net.core.rmem_default = 16777216

net.core.wmem_default = 16777216

net.core.optmem_max = 40960

net.ipv4.tcp_rmem = 4096 87380 16777216

net.ipv4.tcp_wmem = 4096 87380 16777216

# --------------------------------------------------------------------

# The following allow the server to handle lots of connection churn

# --------------------------------------------------------------------

# Disconnect dead TCP connections after 1 minute

net.ipv4.tcp_keepalive_time = 60


Recommendations

/etc/sysctl.conf

# Wait a maximum of 5 * 2 = 10 seconds in the TIME_WAIT state after a FIN, to handle

# any remaining packets in the network.

net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10

# How long to keep ESTABLISHED connections in conntrack table

# Should be higher than tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl )

net.netfilter.nf_conntrack_tcp_timeout_established = 300

net.netfilter.nf_conntrack_generic_timeout = 300

# Allow a high number of timewait sockets

net.ipv4.tcp_max_tw_buckets = 2000000

# Timeout broken connections faster (amount of time to wait for FIN)

net.ipv4.tcp_fin_timeout = 10

# Let the networking stack reuse TIME_WAIT connections when it thinks it's safe to do so

net.ipv4.tcp_tw_reuse = 1

# Determines the wait time between isAlive interval probes (reduce from 75 sec to 15)

net.ipv4.tcp_keepalive_intvl = 15

# Determines the number of probes before timing out (reduce from 9 sec to 5 sec)

net.ipv4.tcp_keepalive_probes = 5

# -------------------------------------------------------------


Recommendations

RIN Syslog Configuration

When the NDB ( Data Broker ) is used in the design, the values of parameters below need tobe 0:

net.ipv4.conf.enp94s0f1.rp_filter = 0

The NDB is a one way device and will not acknowledge packets received that the OS maysend. For this reason, the Kernel will drop packets if the above is set to 1.


Google Cloud

Google CloudThe table below shows an example configuration for a Google Cloud SNYPRarchitecture with 10,000 EPS, and 30-day os search index storage.

Type QuantityInstanceType

CPU Memory Storage

MasterServers

3N1-Highmem-16

16 vCPU 104 GB

(Quantity 1)/root 500 GB(SSD),(Quantity 1)/zookeeper250 GB (SSD)

SNYPRConsoleServers

1N1-standard-16

16 vCPU 60 GB

(Quantity 1)/root 500 GB(SSD),(Quantity 8)/data 500 GB(standard)

Compute /StorageServers

6N1-standard-64

64 vCPU 240 GB

(Quantity 1)/root 128 GB(SSD),(Quantity 5)/search[1-10]1000 GB(standard)

Search /StorageServers

1N1-Highmem-64

64 vCPU 416 GB

(Quantity 1)/root 128 GB(SSD),(Quantity 10)/search[1-10]5500 GB(SSD)


Google Cloud

Type QuantityInstanceType

CPU Memory Storage

KafkaIngestionServers

3N1-standard-8

8 vCPU 30 GB

(Quantity 1)/root 128 GB(SSD),(Quantity 1)/zookeeper256 GB(SSD),(Quantity 3)/data 1024 GBGB (standard)

RemoteIngestionNodes

1N1-standard-8

8 cpu 30 GB

(Quantity 1)/root 128 GB(SSD),(Quantity 3)/data 2000GB GB(standard)

Deployment ArchitectureThe deployment of SNYPR includes a Hadoop cluster as well as servers for the userinterface and for event ingestion. When SNYPR is deployed in a cloud environment,there are two primary deployment alternatives. The first is a Securonix Clouddeployment where all servers in the cluster are hosted in the cloud.

The second is a Securonix Cloud / On-Premise deployment where the console nodesare deployed in the cloud and the ingestion nodes are deployed on-premise. See thediagram (Figure 9) for optional on-premise ingestion nodes.


Spark Jobs Configuration for Kerberized Kafka

Spark Jobs Configuration forKerberized KafkaWhen running the SNYPR spark applications in a kerberized cluster, add the belowparameters in the sparkjobs scripts in order to sparkjobs for connecting to securekafka.

--driver-java-options "-

Djava.security.auth.login.config=/opt/keytabs/jaas.conf -

Djute.maxbuffer=50000000 -Dspark.driver.userClassPathFirst=true -

Dspark.executor.userClassPathFirst=true" \

--conf "spark.executor.extraJavaOptions=-

Djava.security.auth.login.config=/opt/keytabs/jaas.conf -

XX:+UseConcMarkSweepGC -

Dlog4j.configuration=./conf/log4j.properties -

Djute.maxbuffer=50000000 -Xss1G" \



Disaster Recovery AlternativesSNYPR can be deployed to meet several disaster recovery objectives. Because ofthe size of the solution and the costs associated with disaster recover, several DRalternative strategies are available. Since SNYPR can be deployed with an existingHadoop environment, the disaster recovery strategy must align with the DR strategyfor the Hadoop infrastructure being used for SNYPR. The alternatives in thisdocument assume a dedicated Hadoop infrastructure for SNYPR, and describe thedisaster recovery considerations for the entire solution, including Hadoop. If anexisting Hadoop environment is used, the same considerations, are relevant, but theactual configuration of the Hadoop disaster recovery will be assumed to be part of theexisting Hadoop infrastructure.

AlternativesThe SNYPR Disaster Recovery Alternatives include:

1. Advanced DR with Full Infrastructure - identical infrastructure with data replicationfrom primary site to DR Site, with the ability to continue processing in flight mes-sages from the Kafka brokers at the DR site.

2. Full DR with Full Infrastructure - identical infrastructure with select data replicationfrom primary site to DR Site, with the ability to rebuild search indexes after a DRfrom the historical enriched event data, and the ability to process new activityevents at the DR site.

3. Limited DR with limited infrastructure - limited infrastructure with violation, sum-mary, and configuration data only and the ability to process new activity events.

ConsiderationsThe considerations for disaster recovery must be made for each service included inthe solution. The primary considerations for each of the node types are described asfollows:

l SNYPR Console Nodes: The SNYPR Console Nodes include the SNYPR Userinterface and the SNYPR configuration database.

l SNYPR Search Servers are dedicated search nodes that include a local eventindexer and multiple search instances for distributed searches. These servers are



edge nodes in a hadoop cluster that read data from Kafka and index the data tolocal storage on the search servers. The SNYPR Search servers include optim-ization for maximum search performance and density on physical server. ApacheSolr is used for the underlying search server.

l SNYPR-EYE Server is a SNYPR monitoring and alerting server that is used forthe configuration and operational health monitoring of all SNYPR services includ-ing the all the servers in the Hadoop cluster, the processes on the SNYPR Con-sole, the SNYPR Spark Streaming applications running in the YARN cluster,including the performance of the data ingestion of all resources, the performanceand health of the SNYPR Search processes. The SNYPR Eye solution installsand manages SNYPR-EYE agents on the servers in the environment for localmonitoring.

l SNYPR Remote Ingestion Nodes include the ingestion servers with the con-nectors, the incoming activity log files, and the Kafka brokers with the in-flight mes-sages.

l Hadoop Master: These nodes also include the Hadoop administration serviceslike Cloudera Manager and Zookeeper when Hadoop is deployed as part of thesolution. The considerations for disaster recovery at this tier include file systemreplication with rsync, or a backup and restore strategy, as well as MySQL data-base replication for the SNYPR configuration database and the Hive metastore.

l Compute / Storage Nodes: The SNYPR Compute / Storage Nodes include HDFSand all the files stored by the system in HDFS for Hive / Impala table access, Solrindexes, and HBase tables. The considerations for disaster recovery at this tierinclude replication (using distcp) or backup and recovery of the HDFS data,HBase replication (using the WALs), and replication of the Solr collection schemadata.

l Kafka Brokers: The considerations for disaster recovery at this tier include KafkaMirrorMaker for the in-flight Kafka messages.

The exact disaster recovery strategy implemented should be in alignment with thebusiness continuity requirements for each deployment. The table shows thealternatives for disaster recovery configuration with the impact on the businesscontinuity.



Advanced DRwith FullInfrastructure

Full DR with FullInfrastructure

Limited DR withLimitedInfrastructure

DR Target 1 day 1 week1 week (Violations,behavior and dataonly)

Configuration Data X X X

Violation Data X X X

Case Management X X X

BehaviorSummaries

X X X

Historical EnrichedEvents

X X X

Search Indexes Xrebuild searchindexes after DRinitiation

X

Kafka in-flightmessages

X X X

UnprocessedEvent Files

X X X

The availability of the data that SNYPR needs at the disaster site as well as networkfailover and end user access to the disaster recovery infrastructure must also beconsidered. The typical services that are needed at the disaster site to continueprocessing are shown in the diagram below. This includes user and access data, aswell as event logs that are ingested by the solution. For details, refer to the ClouderaBackup and Disaster Recovery at:https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_bdr_about.html.


https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_bdr_about.html.

Documents

SNYPR 6.2 CU4 Architecture Guide · 2019-11-13 · TableofContents Introduction 4 NodesthatIntegratewithHadoop 6 DeploymentAlternatives 9 DedicatedSNYPRDeployment 9 SNYPR DeploymentwithExistingHadoopInfrastructure