Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
June 11 2019
version 1.0 | 21 Mar 2019
AI has emerged from the realm of science fiction and become part of our everyday lives.
“When you have a hammer…”
This implies that an over-emphasis on
marketing vs. results is occurring.
There’s good news:
Your ability to assess vendor claims in
artificial intelligence (AI) is more about how
underlying principles apply to your situation
and less about academic expertise.
Understanding the approach of machine
learning (ML) within a product can give you
enormous insight into understanding what it
can and can not realistically do.
Buy me!
I’m smart!
Super ML!
The best Bayesian!
Ido itall!
AI 4ever!!
!
What emotion is he displaying?
Happy Sad Angry
5
Supervised machine learning is by example. It depends on large collections of training data (e.g.,
faces labeled as “happy,” “sad,” or “angry”) to learn; therefore you must know and have
specimens of exactly what it is you’re seeking to find.
Kitten v. Ice cream
Which answers compelling questions of our time…
Source: Karen Zack @teenybiscuit Animal vs. Food
Ice cream
Ice creamIce cream
Ice cream
Ice creamIce cream
Ice cream
Ice cream
Kitten
Kitten
Kitten
Kitten
Kitten
Kitten
Kitten
Kitten
What emotion is expected?
a. a. a.
b. b. b.
a. Wearing bright clothing?
or…
b. Messing around with props?
a. Cleaning the kitchen?
or…
b. Cooking a meal?
a. Coffee from a shop?
or…
b. A cup of coffee at home?
When is happy “normal” for him?
Don’t tell us what to look for…
… this should be handled by supervised
machine learning algorithms.
Tell us what we’re looking at…
…it’s about identifying similarities and differences without needing to name them.
Unsupervised machine learning doesn’t need specific training data but does need time in situ
to “observe” enough examples.
socializingdrinking coffee
cleaning
It is rare that he wears brightly colored clothing while with his
friends.
It is unusual for him to drink store-bought coffee; he has
only ever been seen with coffee he brewed himself.
He has never cleaned the kitchen on a Monday, he has
only ever done it on a Saturday or Sunday.
happy
angry
sad
It is not about identifying
happy, sad, or angry.
Instead, do we expect what
we’re seeing from the person?
Do we expect him to be happy
when…
“Classroom” vs. “Real world” education
Find similarities… but no names
Find similarities… but no names
Ideal for finding malware
Decades of data to study
Always looks the same no matter
where it manifests
Cybersecurity:Supervised machine learning
“Tell me what I’m looking for…”
Cybersecurity: Unsupervised machine learning
When searching for insider threats, how do you determine what is productive or malicious activity within the
enterprise?
Working at midnight?
Attaching 500MB to an email?
Looking at corporate strategy data?
Checking out software code from Project X?
A machine communicating on port 465?
Machine A & B connecting via HTTP?
Printer “P015” printing 50 pages at noon?
cmd.exe launched on a workstation?
The activities related to insider threats are masked by behavior that, when removed from context, present as
benign. This means we can not simply match a pattern or look for a signature – we must take a different approach
that separates abnormal from normal.
Knowing just this little bit about how ML works can now help you
ask better questions when evaluating vendors.
Find the right tool for the job…
Who/what is Interset?
14
About Interset
• Acquisition by Micro Focus mid-February
• Data science and analytics focused on cybersecurity
• Offices in Ottawa, Canada
• Sales throughout US
Customers
MSSPs/OEMs
15©2019 Micro Focus
Differentiators: Entity-centric
prioritization
investigation
selection
raw events
The “alert janitor” pyramidAnalyst is forced to start with events in order to uncover entities worth investigating
17©2019 Micro Focus
raw events
selection
investigation
prioritization
The “alert janitor” pyramidWhen using a SIEM to detect insider risk, only a very few events ultimately drive a response
What criteria drive this down-select?
Very few enterprises capture even a fraction of their available data, yet are still overwhelmed.
How many analysts know how to identify insider risk reliably?
Legitimate detection is fundamentally “hit or miss.”
18©2019 Micro Focus
prioritization
investigation
selection
raw events
Interset inverts (and improves!) the standard processA prioritized list with available drill-down enables analysts to quickly understand the risk
Short list of high-quality leads
Intuitive drill-down and contextual view of probability
Robust filtering for threat hunting and workflow for response
Source logs are linked to provide details required for evidence gathering and further actions
19©2019 Micro Focus
Interset for Insider Risk
We give you a short list of high-quality leads.
Many users…
…but which are the ones I care about – and
why?
…many servers, many websites…
20©2019 Micro Focus
Interset for Insider RiskMany users…
…but which are the ones I care about – and
why?
…many servers, many websites…
Anomalous behavior for each entity is collected to build a case to describe its potential risk
The priority of the entity in terms of potential risk is described on a scale from 0 to 100
(from normal to anomalous & risky)
21©2019 Micro Focus
Differentiators: Alerts make anomalies accessible
Anomalies vs. AlertsUnderstanding how our math manifests itself to improve analyst understanding
A·nom·a·ly/əˈnäməlē/
A finding outside of the range of normal for a model; an abnormal finding.
Usually (but not always) this is a single model
A·lert/əˈlərt/
One or more anomalies that appear on the entity timeline. ^
1 Highest
Average
2 Self Peers Population
com
par
ed t
o…
This Alert displays multiple AnomaliesAlerts “rollup” Anomalies so that a clear story emerges from the timeline
24
Both “highest” and “average” baselines have
been exceeded
Compared to “self” and the “population”
Differentiators: Optimized to reduce False Positives
Determine probability…
Update baseline
Determine abnormality
New event
Model
probability (p)
1…
…0
26©2019 Micro Focus
When we talk about a “dynamic” baseline, this is what we mean. It changes over time: It is based only on what we observe in situ Every new event is incorporated It is not based on any “third party” expected
behavior
The “working hours” example to understand individual baseline and the concept of probability within Interset
10am is normal – almost certain to occur regularly
Noon is slightly unusual – happens a little less than ½ the time
2am is very unusual – has never happened before
xxx12am 5am 10am 3pm 8pm
27©2019 Micro Focus
Pro
bab
ility
Rules and Thresholds
28
When only “under” or “outside” the curve matter, then the paradigm is binary
“normal”
“abnormal”
An anomaly is non-binary
29
x x“under the curve”
(normal) x
“near the curve”
“far away from the curve”
“Distance off” the curve matters – the further away the value is from an expected result, the more it matters: how abnormal is it?
“distance” matters
…and combine with weighting to get an alert
probability (p)
Model
Entity
weighting (w)
Severity Extreme High Medium Low
Anomaly
1
0
30©2019 Micro Focus
Rather than adjust the rule or threshold to reduce false positives, use weighting to inject business context for relevance.
How unusual?
An alert combines probability and weighting
Compared to self
Compared to peers
Compared to entire population
Significance of the behavior
Login from another country
Accessing new server
Entity enrichment
User w/ bad performance review
“Honeypot” file share
Mergers & acquisition data
Contractors coming to the end of contract
Recently traveled overseas
probability
weighting
Severity Extreme
High
Medium
Low
Alert
How much does it matter?
31©2019 Micro Focus
Differentiators: Risk Score calculation
Calculating an entity risk score
event
event
event
model
model
model
model
model
model
model
model
model
model
update baseline
update baseline
update baseline
calculate p incorporate w
calculate p incorporate w
calculate p incorporate w
update baseline calculate p incorporate w
update baseline calculate p incorporate w
update baseline calculate p incorporate w
update baseline calculate p incorporate w
update baseline calculate p incorporate w
update baseline calculate p incorporate w
update baseline calculate p incorporate w
Risk Score(for a given entity)
Data Type
Data Type
Data Type
100
0
AlertsLogs
33©2019 Micro Focus
The entity Risk Score is a comparative value designed for prioritization
1. Anomalies do not have static weighting in calculating entity Risk Score
2. A Risk Score is bound to the {0 … 100} range
3. Calculation is context-based and no single action can exert undue influence on entity Risk Score increase
Risk Score Characteristics
34
Others
User behavior that has “unusual” characteristics gets assigned a static value
5 points: An event after pre-defined working hours login
15 points: Moving more than 250MB of data but less than 500MB of data from a pre-defined “risky” location
Interset
Working hours Have we seen this user work these hours
before? If we have seen these hours before, was it
recently or long ago? How much outside of previously observed
working hours is the event?
Amount of data moved Is this a location this user has accessed
previously? How does the amount of data moved compare
to previous volumes for self, peers, and population?
Has any user accessed this location recently?
1. No static weighting If we were to pretend that events are equivalent to anomalies…
35
Others
Either risk scoring has no upper bound or there is a max ceiling past which no additional points may be added
Interset
The risk score is squashed into a range that has 100 as an upper bound
2. A risk score is a bound rangeA user can not have a risk score above 100
36
5 + 15 + 50 + 35 + 10 + 25 = ???
Image retrieved from https://www.shmoop.com/functions-graphs-limits/horizontal-
asymptotes.html
29 April 2019
100
These scores only apply to Ann
A Risk Score is a bound range
Ann Funderburk works at an unusual hour…
… connects via VPN from China
… and accesses repositories that she and her peers do not usually access
… and takes from a folder on a repository an unusual number of times
… and moves a significantly high volume of data than normal
100
0
This allows for comparison
Change over time
Other entities
15
46
65
80
97
Risk Score
= 9737©2019 Micro Focus
Another user with same anomaly types is likely to have different scores
3. Calculation is context-basedInterset risk scores are not step functions and they build-in the concept of “decay” over time
38
Note it took a number of actions against an
already elevated risk profile to push Jacob to a
new peak risk score.
Risk score does not immediately return to zero just because of
the absence of anomalies; this is the concept of controlled
decay.
Just because the alerts in this period of were “high risk,” there was not an
automatic push for the entity risk score itself to move into a “high risk” range
Big DataArchitecture
Interset conceptual data flow
Data Stores / Printers
Endpoint Agents
Cloud Services
Firewall / Proxy
Security D
ata
Lake
(Integ
rated
)
Behavioral Analytics
Da
shb
oa
rd &
Ha
nd
-off
Orchestration / Automation
OpenDXL
Case Mgmnt / Svc Desk
REST API
Business Systems
Authentication Sources
SIEM
Email / SMS
Acquire“Which things matter?”
Bring logs and streaming sources together
Baseline“What is normal?”
Incorporate the patterns of behavior that make each entity like (and unlike) others
Detect“Where are the risks?”
Principled analytical methods surface quantified potential threats
Respond“Who takes action?”
Predetermined or ad hoc; automated or manual
40©2019 Micro Focus
The analytical pipeline
Acquire Respond
Analytics
Baseline Score
41©2019 Micro Focus
Components by role
42©2019 Micro Focus
Interset node architecture
Ambari
Stream
Master
Compute
Reporting
Search
kibana
43©2019 Micro Focus
Interset node architecture
Ambari Master
Stream (& Ingest) Compute Search
Reporting
kibana
Deploys and manages Hadoop® services on other nodes
Manages work with Apache Zookeeper and maintains “Master” nodes for Hadoop® services
The Big Data node (“Baseline” and “Detect”) that performs the analytics
Maintains calculated analytic information; accessed by Reporting node
Serves UI and handles outbound traffic (“Respond”)
Handles ingest (“Acquire”) processes with NiFi and Kafka
44©2019 Micro Focus
Interset node architecture on CDH
Management
Stream
Master
Compute
Reporting
Search
kibanamanager
45©2019 Micro Focus
Interset component data flow
(Da
shb
oa
rd)
46©2019 Micro Focus
ArcSight Integration
47
ArcSight: Initial integration points are ConnectorsSummer 2019 delivery
ArcSight Connectors
ServerSecurity device
Network Hardware
Scanner Application
Event Data Source Destination(s)
For supported data types
FlexConnector
SmartConnector
ArcSight: Enhance and combine use cases with anomaly findings
Behavioral Analytics
Da
shb
oa
rd &
Ha
nd
-off
Make use cases
smarter
Enrich event inspector details
ArcSight – Interset: integration usecase (one of)Not official, my personal vision =)
ArcSight Connectors
ServerSecurity device
Network Hardware
Scanner Application
Event Data Source CORRE
Real-time Rules
LW Rule on Risky Users/Entities
Risky Users/Entities List
Main correlation content
Get AL Value variable
Incident based on the
correlation login and
anomaly in behaviour
Interset on Vertica2020 roadmap
Interset analytics running on Vertica
Analytical results pushed to Vertica
Auth models ported to Vertica analytical pipeline
Web Proxy models ported to Vertica analytical pipelineMFS
Interset
Available Resources
52
Available Resources
Links / webinars / demo
Interset website: https://interset.com/
Interset Webinars: https://interset.com/research/webinars/
BrightTalk webinars: https://www.brighttalk.com/search/?q=interset
Interset Demo link: https://esprit.interset.com/
53
Thanks
54