Splunk Use Cases using SwiftStack StorageCreate a new index called proxyfslog in the Splunk Admin console, and set parameters as follows: How-To Guide: Integrating Splunk with SwiftStack

How-To Guide: Integrating Splunk with SwiftStack Storage

Splunk Use Cases using SwiftStack Storage


Table of Contents

Introduction

Suggested Use Cases

Typical Splunk Deployment Scenario

Use Case #1 Configuration Steps: Functional testing of Colddbs: Splunk Sizing and Performance Guidelines:

Hardware Requirements for Indexer and Search Heads: Splunk Recommended hardware spec for indexers and search heads: Dedicated Search Head Hardware Spec: Indexer Hardware Spec: Splunk Data Ingest and Query Performance: Applying This to Your Environment:

Use Case #2 1. Configure Splunk Cold-to-Frozen Archiving 2. Install Swift Watch Folder 3. Set up User on SwiftStack Cluster 4. Configure Swift Watch Folder 5. Archive to FrozenDBs 6. Restore Archived Data:

A. Copy back archived index using Swift Command-Line Client B. Rebuild Index C. Search through restored events as normal

Use Case #3 Configuration Steps:

1.Install Hadoop: 2. Set Up User on SwiftStack Cluster 3. Install Java and Hadoop on your Splunk Indexers 4. Configure Splunk Virtual Index 5. Create Container/Bucket for Splunk-Hadoop Archive 6. Archive 7. Search Archived Data

References


Introduction Splunk provides software and services for organizations to use their machine data to help them be more productive, profitable, competitive and secure through real-time visibility. Splunk's IT operations, security and business analytics capabilities are used widely today in data centers. There are two major types of data Splunk captures:

1. Active data (Hot and Warm) a. Hot data contains newly indexed data and are open for writing. There could be

one or more hot buckets. b. Hot data moves over to Warm and is used for operational searches. There are

many warm buckets. c. Usually, hot and warm data are stored in SSD drives on indexed nodes.

2. Archive Data (Cold and frozen) a. Cold data is used for archiving and can be indexed and searched. There are

many cold buckets. b. Frozen data is hardly used and is primarily used for IT compliance and security. It

is retained for about seven years. The indexer deletes frozen data by default, but you can also archive it. Archived data can be thawed.

c. Long term archiving can also be done using a virtual index (e.g. Hadoop is a common example). Virtual indexes are usually deployed when the customer already uses other Hadoop tools (MapR, etc.).

Typical Data Ingest rates for Splunk users:

1. For mid range customers, 100 - 250 GB per day is typical. 2. For large customers, 1TB a day is normal.

Suggested Use Cases 1. Use Case #1:

a. Use SwiftStack File Access to mount Colddbs (NFS or SMB). Position SwiftStack as a potential replacement of Isilon/NetApp which are more expensive for Splunk workload.

2. Use Case #2: a. Use SwiftStack Watch Folder to move data that rolls out of cold storage to

SwiftStack storage used as frozendbs. Manually move the data back to the thawed data location to search data if required.

3. Use Case #3:


a. Archive using virtual index. Use Swiftstack as a searchable archive for data used in Business Intelligence style queries and providing data analytics.

b. Integration with Hadoop HDFS may be required.

Typical Splunk Deployment Scenario

The following diagram details how Splunk was configured in SwiftStack’s lab for testing.


Use Case #1

Configuration Steps: Use SwiftStack File Access to mount Colddbs using NFS or SMB. The following steps provide an example of the configuration of Splunk and SwiftStack in a SwiftStack test facility. The details of your installation and configuration will be different, but this should provide a guide as to what is required.

1. Install Splunk: a. Create a Splunk VM on CentOS. b. Download Splunk Enterprise from https://www.splunk.com/en_us/download.html. c. After configuring the VM and installing CentOS 7, scp the Splunk installer to the

VM. d. Install Splunk with ̀rpm -ivh

splunk-6.6.2-4b804538c686-linux-2.6-x86_64.rpm` e. Enable startup at boot with ̀/opt/splunk/bin/splunk enable

boot-start` f. Start Splunk with ̀/opt/splunk/bin/splunk start` g. Add a firewall rule to allow inbound traffic from syslog and web UI:

https://www.splunk.com/en_us/download.html


i. firewall-cmd --permanent --zone=public --add-port=8000/tcp

ii. firewall-cmd --permanent --zone=public --add-port=514/udp

iii. firewall-cmd --reload h. Open a web browser to Splunk VM IP:8000, and log in as ‘admin’.

2. Use ProxyFS to mount colddb:

a. Create splunk-proxyfs user in the SwiftStack Controller:

Click on the ‘Create New User’ button under ‘Manage Swift Users.’

http://splunk.swiftstack.org:8000/


After creating the user, click the ‘Push Users to Cluster’ to make the new user active on the cluster. When ‘Job Finished’ appears, the new user is available on the cluster.


b. Add an NFS volume labeled as splunk_test using the SwiftStack controller (Manage > File Access > Add Volume), and export it as an NFS share as /share/splunk-colddb on SE Demo Cluster.

c. Create a directory /mnt/splunk-colddb on the Splunk indexer system, and NFS mount this directory on /share/mnt-splunk-colddb on SE Demo Cluster, which is serving as a NFS server.

d. In this example, go to vfstab, and create an entry to mount it every time the system gets rebooted:

sac-se-proxyfs.swiftstack.demo:/share/splunk-colddb /mnt/splunk-colddb nfs e. Deploy the new configuration to the SwiftStack node.

3. Create a new index called proxyfslog in the Splunk Admin console, and set parameters as follows:


4. Set Data input (as specified by Splunk Lab Documentation) as follows:

a. UDP port 514 b. Manual source type “swift all” c. Set host by DNS d. Index into “proxyfslog”

5. Generate log data to ingest data to a Splunk indexer. 6. Search index data using Splunk Admin console, and look at “index=proxyfslog”


Functional testing of Colddbs: a. Look at cold db buckets rolling over:

i. List directories and see how fast the DBs are rolling over. (Note that the DBindex numbers are not in order; a total of 46 colddb buckets can be seen at any point.)

db_1507598635_1507511992_1173 db_1507603602_1507516960_1183 db_1507610211_1507523385_1193 db_1507614936_1507528296_1203 db_1507618799_1507532539_1213 db_1507599120_1507512477_1174 db_1507604088_1507517443_1184 db_1507610877_1507524173_1194 db_1507615362_1507528716_1204 db_1507675736_1507675311_1336 db_1507599606_1507512962_1175 db_1507604634_1507517990_1185 db_1507611484_1507524780_1195 db_1507615786_1507529141_1205 db_1507675915_1507675735_1337 db_1507600091_1507513448_1176 db_1507605119_1507518477_1186 db_1507611969_1507525326_1196 db_1507616207_1507529568_1206 db_1507741624_1507741145_1338 db_1507600635_1507513994_1177 db_1507605603_1507518961_1187 db_1507612454_1507525811_1197 db_1507616632_1507529992_1207 db_1507760204_1507759853_1339 db_1507601179_1507514538_1178 db_1507606149_1507519506_1188 db_1507612879_1507526237_1198 db_1507617060_1507530416_1208 db_1507761070_1507760522_1340 db_1507601665_1507515023_1179 db_1507606634_1507519992_1189 db_1507613303_1507526661_1199 db_1507617484_1507530843_1209 db_1507602148_1507515506_1180 db_1507607119_1507520476_1190 db_1507613728_1507527086_1200 db_1507617910_1507531267_1210 db_1507602633_1507515991_1181 db_1507607604_1507520958_1191 db_1507614030_1507527388_1201 db_1507618333_1507531692_1211 db_1507603116_1507516473_1182 db_1507609060_1507521447_1192 db_1507614515_1507527872_1202 db_1507618761_1507532113_1212

ii. Run Wireshark to look at live data on the Splunk system. 1. Install XQuartz on Mac, and ssh -Y to Splunk indexer system. 2. Start Wireshark utility, and listen to ens160 network. 3. Use the NFS filter to look at NFS traffic only.

iii. Set filters on NFS data, and analyze packets.


b. Look at Read and Write block size to tune proxyFS:

i. Log in to ProxyFS Node. ii. ps -ef proxyfsd to find path to .conf file. iii. View .conf file, and look for PrivateIPAddr & [HTTPServer]TCPPort. iv. curl <PrivateIPAddr>:<TCPPort>/metrics.

Analysis of Read_Operations: [root@sac-se-proxyfs share]# curl 10.10.11.69:15346/metrics 2>/dev/null | grep read_operations proxyfs_fs_read_operations 729263 proxyfs_inode_directory_read_operations 2185 proxyfs_inode_file_read_operations 1458502 proxyfs_inode_file_read_operations_size_up_to_4KB 1458502 proxyfs_swiftclient_object_put_context_read_operations 416 proxyfs_swiftclient_object_put_context_read_operations_size_up_to_4KB 416 Analysis of Write Operations: [root@sac-se-proxyfs share]# curl 10.10.11.69:15346/metrics 2>/dev/null | grep write_operations proxyfs_fs_write_operations 39136 proxyfs_inode_file_write_operations 39136 proxyfs_inode_file_write_operations_size_16KB_to_32KB 233


proxyfs_inode_file_write_operations_size_32KB_to_64KB 306 proxyfs_inode_file_write_operations_size_4KB_to_8KB 20 proxyfs_inode_file_write_operations_size_8KB_to_16KB 48 proxyfs_inode_file_write_operations_size_over_64KB 37128 proxyfs_inode_file_write_operations_size_up_to_4KB 1401

From the data, reads are mostly of 4K block size and are mostly random due to index search. However, the majority of the writes are of 64K block size and are sequential. Reduced read Cache Line size (from 1 MiB to 64 KiB) for Splunk-proxyfs flow controls for optimizing read performance. Created a new flow control policy for splunk-proxyfs to implement this change (Manage File Access > Shares > Flow Controls): Flow Controls

No change in Max Flush Size is needed, since writes are coalesced to 1 MiB.

Splunk Sizing and Performance Guidelines:

Hardware Requirements for Indexer and Search Heads:

The following table lists the suggested hardware specifications for indexer and search heads as defined by Splunk’s best practices; the recommendations vary according to the rate of data ingest and the number of concurrent users.

For the purposes of this document, SwiftStack’s lab configuration used a single combined instance of search head and indexer running in an ESX virtual machine with a Splunk license allowing up to 100GB of data to be ingested per day.


Note these recommendations from Splunk for search heads and indexers based on daily indexing volume: https://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Summaryofperformancerecommendations

Daily Indexing Volume

<

2GB/day

2 to 300

GB/day

300 to

600

GB/day

600GB to

1TB/day

1 to

2TB/day

2 to

3TB/day

Total

Users: less

than 4

1

combined

instance

of Search

Head and

Indexer

1

combined

instance

of Search

Head and

Indexer

1 Search

Head,

2

Indexers

1 Search

Head,

3

Indexers

1 Search

Head,

7

Indexers

1 Search

Head,

10

Indexers

Total

Users: up

to 8

1

combined

instance

of Search

Head and

Indexer

1 Search

Head,

1

Indexers

1 Search

Head,

2

Indexers

1 Search

Head,

3

Indexers

1 Search

Head,

8

Indexers

1 Search

Head,

12

Indexers

https://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Summaryofperformancerecommendations



Total

Users: up

to 16

1 Search

Head,

1

Indexers

1 Search

Head,

1

Indexers

1 Search

Head,

3

Indexers

2 Search

Heads,

4

Indexers

2 Search

Heads,

10

Indexers

2 Search

Heads,

15

Indexers

Total

Users: up

to 24

1 Search

Head,

1

Indexers

1 Search

Head,

2

Indexers

2 Search

Heads,

3

Indexers

2 Search

Heads,

6

Indexers

2 Search

Heads,

12

Indexers

3 Search

Heads,

18

Indexers

Total

Users: up

to 48

1 Search

Head,

2

Indexers

1 Search

Head,

2

Indexers

2 Search

Heads,

4

Indexers

2 Search

Heads,

7

Indexers

3 Search

Heads,

14

Indexers

3 Search

Heads,

21

Indexers

Splunk Recommended hardware spec for indexers and search heads: http://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Referencehardware

Dedicated Search Head Hardware Spec: A search head uses CPU resources more consistently than an indexer but does not require fast disk

throughput or a large pool of local storage for indexing.

● Intel 64-bit chip architecture

● 16 CPU cores at 2Ghz or greater speed per core.

● 12GB RAM

● 2 x 300GB, 10,000 RPM SAS hard disks, configured in RAID 1

● A 1Gb Ethernet NIC, optional 2nd NIC for a management network

● A 64-bit Linux or Windows distribution

http://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Referencehardware


A search request uses up to 1 CPU core while the search is active. You must account for scheduled

searches when you provision a search head in addition to ad-hoc searches that users run. More

active users and higher concurrent search loads require additional CPU cores.

Indexer Hardware Spec: When you distribute the indexing process, the Splunk platform can scale to consume terabytes of

data in a day. When you add more indexers, you distribute the work of search requests and data

indexing across those indexers. This increases performance significantly.

Here is the reference indexer specification:

● Intel 64-bit chip architecture

● 12 CPU cores at 2GHz or greater per core

● 12GB RAM

● Disk subsystem capable of 800 average IOPS

● A 1Gb Ethernet NIC, optional second NIC for a management network

● A 64-bit Linux or Windows distribution

Splunk has introduced these two new specifications to help improve user experience by provisioning

additional CPU cores for better indexing performance and search concurrency.

Splunk Data Ingest and Query Performance: In SwiftStack’s lab, we used log data and event generation scripts from various host systems to ingest close to 100GB of data per day for nearly two weeks—resulting in over 1.2 TB of indexed data as shown in the graph below:


The average data ingested for this two-week period (from January 10 to January 25) ranged from 63 to 93 GB/day. Concurrent with this data ingest, we ran a script to execute real-time searches on “index=proxyfslog” in the background while data was being ingested. Then, with 1.2TB of data in place, we ran additional scripts that used rtsearch (Splunk real time search) to give us some realistic query performance data; the search results are depicted in the screenshots below:


After logging hundreds of searches, the data shows a median runtime of 0.20 seconds for admin searches with median response times across all searches ranging from 0.07 seconds to 0.47 seconds in our lab environment.

Applying This to Your Environment: Recall that the example data above is from SwiftStack’s lab environment in which both Splunk’s services and SwiftStack’s software were running in mid-sized virtual machines; in most production environments, you should expect ingest and search performance to be at least somewhat (perhaps significantly) greater than these numbers. Even so, consider that this demonstrated the following performance:

● SwiftStack’s File Access easily ingested a sustained rate of data at nearly 100 GBytes/day to a single NFS mount, and this was limited only by our Splunk lab license. We expect that each NFS share in a typical hardware environment should scale to ingest approximately 100 MBytes/second sustained with 10GbE link.


● With 1.2TB of cold data to search in SwiftStack, Splunk averaged search latencies of well under 0.5 seconds.

● A single NFS mount can handle up to 3TB ingest data per day in your splunk environment with 10GbE link.

Therefore, if you are using Splunk and need more data online than can fit in your hot and warm storage tiers, expanding to use SwiftStack’s File Access for a cold storage tier should provide you a scalable, durable, and highly available destination while still providing sub-second search latencies for your queries.

Use Case #2 Use SwiftStack Watch Folder to move data that rolls out of cold storage to SwiftStack storage used as frozendbs. Manually move the data back to the thawed data location to search data when needed. The following steps provide an example of the configuration of Splunk and SwiftStack in a SwiftStack test facility. The details of your installation and configuration will be different, but this should provide a guide as to what is required.

1. Configure Splunk Cold-to-Frozen Archiving By default, Splunk will delete buckets that age out of Cold storage. To save this data, you must set a path to the Frozen storage in the index configuration. This can be set in the Splunk Web interface under Settings > Indexes by setting the ‘Frozen Path’ parameter.

The indexes.conf file can also be manually edited to include the ‘coldToFrozenDir’ parameter.

The path used can be anywhere accessible by the indexer system. In this example, we are setting it to a ‘frozendb’ directory in the index’s own directory.


2. Install Swift Watch Folder The Swift Watch Folder is a system daemon which will watch a specified directory and, when there are files written into this directory, will upload those files to a specified SwiftStack cluster. To install the Swift Watch Folder, get the appropriate install package (rpm, deb, or msi) from SwiftStack, and install it on the same system running the Splunk Indexer.

In this example, we installed via an rpm using the `rpm -ivh swift-watch-folder-0.0.4-1.noarch.rpm` command.

3. Set up User on SwiftStack Cluster In the SwiftStack Controller, click on ‘Users & Accounts’ next to the SwiftStack cluster you will be using to archive frozen Splunk data.

Click on the ‘Create New User’ button under ‘Manage Swift Users’


On the following screen, enter the username and password you will be using for storing Splunk data. In this example, we are using the user ‘splunk’. Ensure this user is ‘enabled’. After creating the user, click the ‘Push Users to Cluster’ to make the new user active on the cluster. When ‘Job Finished’ appears, the new user is available on the cluster.


4. Configure Swift Watch Folder On the Splunk Indexer where Swift Watch Folder is installed, edit the file `/etc/swift-watch-folder/watch-folder.conf`. The parameters in the [global] section specify the SwiftStack cluster and credentials to use for uploading new data. There are other global settings that will specify how data should be uploaded. For the Splunk Frozen Archive, the parameters `preserve_path` and `recursive` should be set to `True`. There is also a section for each watch folder that is named by the path to the folder to watch. The parameters in the watch folder section specify what container to upload data into, as well as specifying a segments container for large objects (i.e. objects larger than `segment_size` in the [global] section). There is also a parameter that specifies how long to wait after upload for the local file to be deleted.

In this example, we are using our ‘splunk’ user from earlier and setting up a section for our coldToFrozenDir directory. In the watch folder section, we specify the container `syslog_frozendb` and set `delete_uploaded_file_after` to 5 minutes. After changing the `watch-folder.conf` file, restart the swift-watch-folder daemon (e.g. `systemctl restart swift-watch-folder`). Once it is restarted, you can monitor the log file `/var/log/swift-watch-folder.log` to see the daemon’s activity.


5. Archive to FrozenDBs Depending on the index data aging policy, data will be rolled over from warm to cold and then from cold to frozen. Once this happens, directories will be created in the ‘coldToFrozedDir’ for the index. Edit /opt/splunk/etc/apps/search/local/indexes.conf file to set the following:

a. Set coldToFrozenDir to $SPLUNK_DB/syslog/frozendb b. Set the overall size of the index (maxTotalDataSize) to 100 MB c. Set the max DB size (maxDataSize) to 10 MB d. Set maxWarmDBCount to 3.

This example will create a total of 10 DB buckets (1 Hot, 3 Warm, 1 Cold and the rest 6 Frozen). This is an extremely small setup, but it works well to demonstrate this functionality. Eventually, syslog data will start rolling into the frozendb directory:

The Swift Watch folder log will show writes in the configured data, will upload data, and then will delete the local file. Frozen DB data can not be searched directly. However, using SwiftStack Client, we can see archived buckets created earlier for the splunk user pulled into syslog_frozendb.

Using the SwiftStack Client, we can see the archived buckets for our index under the splunk user we created earlier.


6. Restore Archived Data: Copy archived data from SwiftStack to the index’s thaweddb directory, and reload the index. Archived buckets are named with time ranges,hence one can selectively copy back the data that needs to be searched within a specified time range. Here are the steps:

A. Copy back archived index using Swift Command-Line Client Set up credentials for the archived account as follows:

a. Set ST_AUTH, ST_USER and ST_KEY variables


After setting these environment variables, you can use the command `swift list <your container>` to see your archived index buckets.

To copy the data back, first change directory to your index’s `thaweddb` directory. Then, run the command `swift download <your container> <archived bucket>`.

B. Rebuild Index

After copying the archived index back, the index must be rebuilt to be able to search the thawed data. To do this, run the command `splunk rebuild /path/to/your/index/thawedb`. You can also restart the whole indexer with the command `splunk restart`, which will do the same thing.


C. Search through restored events as normal After restoring the data, you can now use the Splunk Web interface to search your thawed data. Use “index=syslog” in search tab in Splunk web console:

One can search events further back in the timeline. We can change the time window from as small as the past five minutes to more than 24 hours. Once the search is done, one can optionally delete data in the ‘thaweddb’ directory.

Use Case #3

Set up a Splunk installation to archive index data to SwiftStack using an Hadoop cluster for storage and subsequent search. Prior to this, a SwiftStack cluster, a Splunk installation, and a Hadoop Cluster need to be deployed and running. The following steps provide an example of the configuration of Splunk and SwiftStack in a SwiftStack test facility. The details of your installation and configuration will be different, but this should provide a guide as to what is required.


Configuration Steps:

1.Install Hadoop: 1. Download Apache Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/ 2. After configuring the VM and installing CentOS 7, scp the hadoop package to the VM 3. Install java: ̀yum -y install java-1.7.0-openjdk` 4. Add the following to ̀/etc/profile.d/java.sh`

a. export JAVA_HOME=/usr/lib/jvm/jre 5. Add the following to ̀/etc/profile.d/hadoop.sh`

a. export HADOOP_HOME=/usr/local/hadoop-2.8.1 b. export PATH=$PATH:$HADOOP_HOME/bin c. export

HADOOP_CLASSPATH=/usr/local/hadoop/share/hadoop/tools/lib/* 6. Install/Extract Hadoop with the command ̀tar -zxvf hadoop-2.8.1.tar.gz -C

/usr/local/ && ln -s /usr/local/hadoop-2.8.1 /usr/local/hadoop` 7. Set up the environment with: ̀source /etc/profile.d/*` 8. Set up HDFS with ̀/usr/local/hadoop/bin/hdfs namenode -format` 9. Add firewall rules for Hadoop:

a. firewall-cmd --zone=public --add-port=50090/tcp --add-port=50070/tcp --add-port=50010/tcp --add-port=50075/tcp --add-port=50020/tcp --add-port=35495/tcp --add-port=9000/tcp --add-port=8042/tcp --add-port=8088/tcp --add-port=13562/tcp --add-port=8030/tcp --add-port=39327/tcp --add-port=8031/tcp --add-port=8032/tcp --add-port=8033/tcp --add-port=8040/tcp --permanent

b. firewall-cmd --reload 10. Start Hadoop with: ̀/usr/local/hadoop/sbin/start-dfs.sh &&

/usr/local/hadoop/sbin/start-yarn.sh`

2. Set Up User on SwiftStack Cluster In the SwiftStack Controller, click on ‘Users & Accounts’ next to the SwiftStack cluster you will be using to archive frozen Splunk data.

Click on the ‘Create New User’ button under ‘Manage Swift Users’

http://www.apache.org/dyn/closer.cgi/hadoop/common/


On the following screen, enter in the username and password you will be using for storing Splunk data. In this example, we are using the user ‘splunk’. Ensure this user is ‘enabled’.


After creating the user, click the ‘Push Users to Cluster’ to make the new user active on the cluster. When ‘Job Finished’ appears, the new user is available on the cluster.

After pushing the new users, return to the “Users & Accounts” page, and click the “Show S3 API Key” button. The revealed key is the Secret Key for the new user to access the S3 API on the SwiftStack cluster.

3. Install Java and Hadoop on your Splunk Indexers For Splunk to be able to connect to Hadoop to archive data and launch search jobs, the Splunk Indexers need to have Hadoop client software installed. The steps to install Hadoop may differ based on your Hadoop distribution. If you are using Apache Hadoop, this can be done by copying the hadoop-X.Y.Z.tar.gz to the indexers and running the command tar -zxvf ./hadoop-X.Y.Z.tar.gz -C /usr/local/ && ln -s /usr/local/hadoop-X.Y.Z /usr/local/hadoop as the root user. Take note of where Hadoop is installed, as you will need it for the Splunk Virtual Index Provider configuration below. If not already installed, you will also need to install Java on the Splunk indexer. This can be done using the system’s package manager (e.g. yum install java-1.7.0-openjdk).

4. Configure Splunk Virtual Index In the Splunk web interface, as an Administrator, go to “Settings,” and select “Virtual indexes” under ‘Data’. Under the ‘Providers’ tab, click the “New Provider” button. This is where we will configure parameters for connecting to the Hadoop resources for the purpose of archiving and searching index data.


Give the provider a name (e.g. “hadoop-archive”), and—under “Provider Family”—select “hadoop.” Under “Environment Variables,” enter the paths to Java on the Splunk Indexers (e.g. the path in the output of the command which java) and Hadoop (where Hadoop client libraries were installed from above).



Under “Hadoop Cluster Information”, select the version of Hadoop you are running in your environment. In this example, we are selecting “Hadoop 2.x, (Yarn)”. Depending on which version you select, you will need to enter the address and port of resources for your Hadoop instance. In this example, we need to enter paths to HDFS as well as the addresses and ports of the Resource Manager and Resource Scheduler. Under “Additional Settings,” you will need to add some settings to configure the Hadoop archiving and search jobs to use the SwiftStack cluster as well as configure paths to Hadoop libraries for this activity. Click the “New Setting” option at the bottom of the page, and add the following settings (you will need to click “New Setting” for each setting):

● vix.env.HADOOP_TOOLS: $HADOOP_HOME/share/hadoop/tools/lib ● vix.splunk.jars:

$HADOOP_TOOLS/hadoop-aws-2.8.0.jar,$HADOOP_TOOLS/aws-java-sdk-core-1.10.6.jar,$HADOOP_TOOLS/aws-java-sdk-kms-1.10.6.jar,$HADOOP_TOOLS/aws-java-sdk-s3-1.10.6.jar,$HADOOP_TOOLS/jackson-databind-2.2.3.jar,$HADOOP_TOOLS/jackson-core-2.2.3.jar,$HADOOP_TOOLS/jackson-annotations-2.2.3.jar

○ Note: depending on your Hadoop version, these filenames might be different ● vix.fs.s3a.access.key: (your created SwiftStack user, see above) ● vix.fs.s3a.secret.key: (your SwiftStack user’s S3 API secret key, see above)


● vix.fs.s3a.endpoint: (your SwiftStack cluster API hostname) ● vis.fs.s3a.path.style.access: true ● vix.fs.s3a.connection.ssl.enabled: false (only if your SwiftStack cluster

does not use SSL) After creating these settings, click ‘Save.’ Click the “Archived Indexes” tab on the “Virtual Indexes” main screen. Click the “New Archived Index” button to go to the Archive Index configuration screen.

In the following screen, select the index you wish to archive in “Splunk Indexes”. For Archived Index Name Suffix, enter something informative to indicate this index is an archive and not a live index (in this example, we use ‘_archive’). In the “Destination Provider” drop-down, select the


provider created in the previous steps. In “Destination Path in HDFS,” enter the path s3a://<bucket-name>/, where <bucket-name> is the name of the container or bucket in which you wish the archived index data to be stored. Finally, under “Older Than,” select how old an indexed item needs to be before archiving (in this example, we selected ‘15 minutes’ for the sake of demonstration; in production, this length of time would depend on your data aging policy and Splunk cluster architecture). Finally, click ‘Save’ to create the new Archived Index.

5. Create Container/Bucket for Splunk-Hadoop Archive Splunk will not auto-create the archive container/bucket if it does not exist. If the bucket name specified in the “Destination Path in HDFS” setting above does not exist, create the bucket using an appropriate API client.

In this example, we’re using the SwiftStack Client to create the ‘hadoop-archive’ bucket.

6. Archive Depending on your index data aging policy and the “Older Than” setting used in the Archive Index setting, Splunk will eventually archive index data to Hadoop and store the archive in SwiftStack. You can verify this is working by looking at the contents of the specified bucket.


Once data has been archived, there will be objects in the SwiftStack bucket.

7. Search Archived Data To search for data which has been archived to SwiftStack, adjust your search parameters to include the name of the Archived Index (e.g. index=syslog_archive). Archived index data is searchable. Live index data can be seen using search index=syslog


References: ● https://github.com/swiftstack/swift-watch-folder ● https://docs.google.com/document/d/1fEPqLuc-Fd_PiXilkcWcrPE-vruhB_IRbGASN_Tr8

OI/edit ● http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Restorearchiveddata ● http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Setaretirementandarchiving

policy ● http://docs.splunk.com/Documentation/Splunk/6.6.2/admin/Indexesconf ● https://www.swiftstack.com/docs/install/index.html ● http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/ArchivingindexestoHadoop ● http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/ArchivingSplunkindexestoS

3 ● https://www.swiftstack.com/docs/admin/middleware/s3_middleware.html#s3-middleware-

reference-label ● https://docs.google.com/document/d/1wsoZ1d1O6zI8wwfW9niZ5uxzb-2C7NrwcdIPyGH

Ekjo/edit#heading=h.wat6b1smeg4b ● https://docs.google.com/document/d/1Byr-Vx9UBEr-4L8zFRyuRdGGvjmCANHp4ag4R7

JOICE ● http://docs.splunk.com/Documentation/Splunk/7.0.0/Installation/Systemrequirements ● http://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Referencehardware ● https://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Summaryofperformancer

ecommendations

https://github.com/swiftstack/swift-watch-folder

https://docs.google.com/document/d/1fEPqLuc-Fd_PiXilkcWcrPE-vruhB_IRbGASN_Tr8OI/edit

https://docs.google.com/document/d/1fEPqLuc-Fd_PiXilkcWcrPE-vruhB_IRbGASN_Tr8OI/edit

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Restorearchiveddata

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Setaretirementandarchivingpolicy

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/Setaretirementandarchivingpolicy

http://docs.splunk.com/Documentation/Splunk/6.6.2/admin/Indexesconf

https://www.swiftstack.com/docs/install/index.html

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/ArchivingindexestoHadoop

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/ArchivingSplunkindexestoS3

http://docs.splunk.com/Documentation/Splunk/6.6.2/Indexer/ArchivingSplunkindexestoS3

https://www.swiftstack.com/docs/admin/middleware/s3_middleware.html#s3-middleware-reference-label

https://www.swiftstack.com/docs/admin/middleware/s3_middleware.html#s3-middleware-reference-label

https://docs.google.com/document/d/1wsoZ1d1O6zI8wwfW9niZ5uxzb-2C7NrwcdIPyGHEkjo/edit#heading=h.wat6b1smeg4b

https://docs.google.com/document/d/1wsoZ1d1O6zI8wwfW9niZ5uxzb-2C7NrwcdIPyGHEkjo/edit#heading=h.wat6b1smeg4b

https://docs.google.com/document/d/1Byr-Vx9UBEr-4L8zFRyuRdGGvjmCANHp4ag4R7JOICE

https://docs.google.com/document/d/1Byr-Vx9UBEr-4L8zFRyuRdGGvjmCANHp4ag4R7JOICE

http://docs.splunk.com/Documentation/Splunk/7.0.0/Installation/Systemrequirements

http://docs.splunk.com/Documentation/Splunk/7.0.0/Capacity/Referencehardware



Documents

Splunk Use Cases using SwiftStack StorageCreate a new index called proxyfslog in the Splunk Admin console, and set parameters as follows: How-To Guide: Integrating Splunk with SwiftStack