Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Copyright © 2014 Splunk Inc.
DOUBLE DOWN: The Business and Technology of Running a MulI-‐terabyte Splunk Enterprise Environment in a Fortune 10 Company
Jacob Wilkins, Splunk Architect, GE
Disclaimer
2
During the course of this presentaIon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauIon you that such statements reflect our current expectaIons and
esImates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaIon are being made as of the Ime and date of its live presentaIon. If reviewed aXer its live presentaIon, this presentaIon may not contain current or accurate informaIon. We do not assume any obligaIon to update any forward-‐looking statements we may make. In addiIon, any informaIon about our roadmap outlines our general product direcIon and is subject to change at any Ime without noIce. It is for informaIonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaIon either to develop the features or funcIonality described or to
include any such feature or funcIonality in a future release.
General Electric ! About General Electric
– Founded in 1892 – Headquartered in Fairfield, CT – MulI-‐naIonal conglomerate – Over 300K employees worldwide – #9 In FORTUNE 1000 (June 2014) – GE ImaginaIon at Work
3
About Jacob ! Splunk Architect/Engineer ! Typical “Unix guy” system engineer background ! Former (recovering) CISSP ! .conf2013 ahendee ! Beher at administering computers than using them ! Beher at administering Splunk than using it
4
How Much Does GE Splunk?
! About 5 TB/day ! 2 US-‐Based primary datacenters
– AddiIonal indexers in 10 locaIons across APAC and EMEA – EU Data Privacy… Easier to index things in their country of origin
! 300+ AcIve users across mulIple sub-‐businesses ! Data about over InformaIon about 400,000+ workstaIons, servers and devices
5
Why Does GE Splunk? ! An “incident” a few years ago caused GE to get serious about infosec ! Newly formed Incident Response team needed visibility ! Corporate-‐managed Splunk instance was born
– All sub-‐businesses are REQUIRED to send WinEventLog:Security from DCs, and other security-‐focused logs.
– No chargebacks for mandated logs. (The first hit is free)
6
Splunk Growth Drivers ! Hunger for business intelligence from machine data ! Regulatory controls and compliance ! OperaIonal use cases outside of core Security ! Responding to acIve security concerns ! Passionate users: Employees love Splunk!
7
What Does GE Splunk? GB indexed by Sourcetype
8
SPLaaS = Splunk-‐as-‐a-‐Service at GE ! Centrally managed w/charge backs: based on Splunk usage each dept. contributes to license, hardware and storage costs
! Use Cases: Security, Compliance, Infrastructure & Ops Management
! Global Deployment: GE corporate, GE Capital, GE Capital Americas, ISS
9
Gesng Started: Our Process ! What if you had to add an addiIonal 2TB per? ! Do you have the manpower?
– How many new sourcetypes? – Can you leverage exisIng TA’s and the CIM? – How much custom extracIon, evenhyping, tagging will you need?
! How much infrastructure will you need? – How will you tell if your indexers are saturated? – How do you know when it is Ime to grow?
! Everyone wants in—how do you prioriIze? ! Will there be a flood of new users for this data?
– Is your user provisioning process scalable? – Can your search head handle it?
10
Engagement Model ! Use a new engagement request/quesIonnaire ! Create a back log of requests ! Provide level of effort and Iming ! Share/ promote success
– Splunk roundtables – Don’t cave to the squeaky wheel – highlight posiIves
“Everyone was doing everything. Specialization/shared services help you divide and
conquer with your Architecture, Ops and Development Teams.”
11
A Single, Federated Environment
! It is easier to administer one ginormous Splunk environment than 2 or 3 large environments
! Shared lookups ! SomeImes some parIIoning / compartmentalizaIon is necessary…
– I regret half of the parIIoning that we’ve allowed in our environment
Would you rather fight 1 horse sized duck, or 100 duck sized horses?
12
Event level RBAC
! Lookup with 2 columns: host, host_role ! Auto-‐lookup in props.conf
– performs lookup on host – outputs host_role
! srchFilter statement in authorize.conf – srchFilter = host_role=biz1_foo
! Search head rewrites your search before it is dispatched – As if (host=foo1 OR host=foo2 OR OR host=…) had used instead of host_role
! Beware: Accelerated Datamodels cannot enforce this!
What do you mean “reverse” lookup?
13
How Does GE use Splunk?
14
! Security OperaIons – Malware outbreak tracking – Vulnerability idenIficaIon and remediaIon – Patch deployment tracking – Policy analysis
! Discovered Asset Inventory – Doesn’t rely on staIc inventory
contains only ‘acIve’ assets – Allows for correlaIon of many
security sourcetypes – Provides the ability to report by BU or
idenIfy ownership
How Does GE operate Splunk? ! Automate repeatable tasks
– Puppet to manage indexer config – Currently tesIng ERB generated splunk configs!
! Don’t skimp on hardware! – Indexers have 22 10k drives in a RAID10 – Search Heads likely to become thread-‐bound as you scale
! No, seriously. Don’t skimp on hardware – Whole class of Ime-‐wasIng problems vanish when you have
proper hardware
15
What Has Made Splunk Hard (for us)? ! User/search concurrency ! DistribuIng search across WAN to EMEA /ASPAC ! Cardinality!
– >500K User accounts (environment wide, not splunk users) – >500k DisInct “hosts” (that we know about)
ê Nearly 400k disInct hosts reporIng AV events! – Why is this a problem?
ê Example: For each business (Healthcare, AviaIon, Capital,etc) show the count hosts that have had AV events for “foovirus” AND “barvirus” in the last 24 hours
ê Now, make the dashboard dynamic so that any 2 virus alerts can be selected
16
Best PracIces (Business Process) ! Have a small lab environment to test/break things
! Accept the fact that some things will only be testable in producIon
! Document processes or create them – Enables faster ramp Ime for new resources
! Get funded – Your cost model should include:
ê Infrastructure + license + maintenance + resources
“Focus on funding and managing as a service from the outset enables focus on users and usage vs. nickel and diming.”
17
Best PracIces (Technical) Use FireBrigade
18
Best PracIces (Technical) Use SoS
19
Best PracIces (Technical) Use Splunk to Splunk your Splunks yourself!
! Count of searches dispatched each hour on the same graph as (max) system load for that hour – Yes, they are sharing the Y-‐axis – Can you spot when we made a significant sesngs change?
20
Best PracIces (Technical) Can you spot when we disabled Transparent Huge Pages?
21
Best PracIces (Technical) Know what your environment is doing
22
Educate and Empower Users ! Implemented a dedicated Splunk support community
! Leverage blogs, newslehers, videos, wikis
! Share best pracIces and searches ! Provide Splunk training courses
23
What’s Next – Long Term Vision ! EvaluaIng Splunk App for Enterprise Security ! Refocusing on centralized knowledge object management and opImizaIon
! Beher enrichment, maybe via Ighter CMDB integraIon ! Clustering?
"Splunk is going to be XYZ"
24
Wrap-‐Up
25
Visit #splunk on EFNet (irc.efnet.org)
Power User Tricks: TERM
26
! index=bro sourcetype=bro_conn 3.3.3.3 | stats count – Ran 5 minutes, 7% complete. – DEBUG: base lispy: [ AND 3 index::bro sourcetype::bro_conn ]
! index=bro sourcetype=bro_conn TERM(3.3.3.3) | stats count
– This search has completed and has returned 1 result by scanning 10 events in 13.588 seconds.
– DEBUG: base lispy: [ AND 3.3.3.3 index::bro sourcetype::bro_conn ]
THANK YOU