Belgrade summit May 13 2019 - Fundorfinamicrofocus.fundorfina.pl/wp-content/uploads/2019/06/Interset-Q2-21019.pdfUnsupervised machine learning doesn’t need speciﬁc training data

June 11 2019

version 1.0 | 21 Mar 2019

AI has emerged from the realm of science fiction and become part of our everyday lives.

“When you have a hammer…”

This implies that an over-emphasis on

marketing vs. results is occurring.

There’s good news:

Your ability to assess vendor claims in

artificial intelligence (AI) is more about how

underlying principles apply to your situation

and less about academic expertise.

Understanding the approach of machine

learning (ML) within a product can give you

enormous insight into understanding what it

can and can not realistically do.

Buy me!

I’m smart!

Super ML!

The best Bayesian!

Ido itall!

AI 4ever!!

!

What emotion is he displaying?

Happy Sad Angry

5

Supervised machine learning is by example. It depends on large collections of training data (e.g.,

faces labeled as “happy,” “sad,” or “angry”) to learn; therefore you must know and have

specimens of exactly what it is you’re seeking to find.

Kitten v. Ice cream

Which answers compelling questions of our time…

Source: Karen Zack @teenybiscuit Animal vs. Food

Ice cream

Ice creamIce cream

Ice cream

Ice creamIce cream

Ice cream

Ice cream

Kitten

Kitten

Kitten

Kitten

Kitten

Kitten

Kitten

Kitten

What emotion is expected?

a. a. a.

b. b. b.

a. Wearing bright clothing?

or…

b. Messing around with props?

a. Cleaning the kitchen?

or…

b. Cooking a meal?

a. Coffee from a shop?

or…

b. A cup of coffee at home?

When is happy “normal” for him?

Don’t tell us what to look for…

… this should be handled by supervised

machine learning algorithms.

Tell us what we’re looking at…

…it’s about identifying similarities and differences without needing to name them.

Unsupervised machine learning doesn’t need specific training data but does need time in situ

to “observe” enough examples.

socializingdrinking coffee

cleaning

It is rare that he wears brightly colored clothing while with his

friends.

It is unusual for him to drink store-bought coffee; he has

only ever been seen with coffee he brewed himself.

He has never cleaned the kitchen on a Monday, he has

only ever done it on a Saturday or Sunday.

happy

angry

sad

It is not about identifying

happy, sad, or angry.

Instead, do we expect what

we’re seeing from the person?

Do we expect him to be happy

when…

“Classroom” vs. “Real world” education

Find similarities… but no names

Find similarities… but no names

Ideal for finding malware

Decades of data to study

Always looks the same no matter

where it manifests

Cybersecurity:Supervised machine learning

“Tell me what I’m looking for…”

Cybersecurity: Unsupervised machine learning

When searching for insider threats, how do you determine what is productive or malicious activity within the

enterprise?

Working at midnight?

Attaching 500MB to an email?

Looking at corporate strategy data?

Checking out software code from Project X?

A machine communicating on port 465?

Machine A & B connecting via HTTP?

Printer “P015” printing 50 pages at noon?

cmd.exe launched on a workstation?

The activities related to insider threats are masked by behavior that, when removed from context, present as

benign. This means we can not simply match a pattern or look for a signature – we must take a different approach

that separates abnormal from normal.

Knowing just this little bit about how ML works can now help you

ask better questions when evaluating vendors.

Find the right tool for the job…

Who/what is Interset?

14

About Interset

• Acquisition by Micro Focus mid-February

• Data science and analytics focused on cybersecurity

• Offices in Ottawa, Canada

• Sales throughout US

Customers

MSSPs/OEMs

15©2019 Micro Focus

Differentiators: Entity-centric

prioritization

investigation

selection

raw events

The “alert janitor” pyramidAnalyst is forced to start with events in order to uncover entities worth investigating


raw events

selection

investigation

prioritization

The “alert janitor” pyramidWhen using a SIEM to detect insider risk, only a very few events ultimately drive a response

What criteria drive this down-select?

Very few enterprises capture even a fraction of their available data, yet are still overwhelmed.

How many analysts know how to identify insider risk reliably?

Legitimate detection is fundamentally “hit or miss.”


prioritization

investigation

selection

raw events

Interset inverts (and improves!) the standard processA prioritized list with available drill-down enables analysts to quickly understand the risk

Short list of high-quality leads

Intuitive drill-down and contextual view of probability

Robust filtering for threat hunting and workflow for response

Source logs are linked to provide details required for evidence gathering and further actions


Interset for Insider Risk

We give you a short list of high-quality leads.

Many users…

…but which are the ones I care about – and

why?

…many servers, many websites…


Interset for Insider RiskMany users…

…but which are the ones I care about – and

why?

…many servers, many websites…

Anomalous behavior for each entity is collected to build a case to describe its potential risk

The priority of the entity in terms of potential risk is described on a scale from 0 to 100

(from normal to anomalous & risky)


Differentiators: Alerts make anomalies accessible

Anomalies vs. AlertsUnderstanding how our math manifests itself to improve analyst understanding

A·nom·a·ly/əˈnäməlē/

A finding outside of the range of normal for a model; an abnormal finding.

Usually (but not always) this is a single model

A·lert/əˈlərt/

One or more anomalies that appear on the entity timeline. ^

1 Highest

Average

2 Self Peers Population

com

par

ed t

o…

This Alert displays multiple AnomaliesAlerts “rollup” Anomalies so that a clear story emerges from the timeline

24

Both “highest” and “average” baselines have

been exceeded

Compared to “self” and the “population”

Differentiators: Optimized to reduce False Positives

Determine probability…

Update baseline

Determine abnormality

New event

Model

probability (p)

1…

…0


When we talk about a “dynamic” baseline, this is what we mean. It changes over time: It is based only on what we observe in situ Every new event is incorporated It is not based on any “third party” expected

behavior

The “working hours” example to understand individual baseline and the concept of probability within Interset

10am is normal – almost certain to occur regularly

Noon is slightly unusual – happens a little less than ½ the time

2am is very unusual – has never happened before

xxx12am 5am 10am 3pm 8pm


Pro

bab

ility

Rules and Thresholds

28

When only “under” or “outside” the curve matter, then the paradigm is binary

“normal”

“abnormal”

An anomaly is non-binary

29

x x“under the curve”

(normal) x

“near the curve”

“far away from the curve”

“Distance off” the curve matters – the further away the value is from an expected result, the more it matters: how abnormal is it?

“distance” matters

…and combine with weighting to get an alert

probability (p)

Model

Entity

weighting (w)

Severity Extreme High Medium Low

Anomaly

1

0


Rather than adjust the rule or threshold to reduce false positives, use weighting to inject business context for relevance.

How unusual?

An alert combines probability and weighting

Compared to self

Compared to peers

Compared to entire population

Significance of the behavior

Login from another country

Accessing new server

Entity enrichment

User w/ bad performance review

“Honeypot” file share

Mergers & acquisition data

Contractors coming to the end of contract

Recently traveled overseas

probability

weighting

Severity Extreme

High

Medium

Low

Alert

How much does it matter?


Differentiators: Risk Score calculation

Calculating an entity risk score

event

event

event

model

model

model

model

model

model

model

model

model

model

update baseline

update baseline

update baseline

calculate p incorporate w



update baseline calculate p incorporate w







Risk Score(for a given entity)

Data Type

Data Type

Data Type

100

0

AlertsLogs


The entity Risk Score is a comparative value designed for prioritization

1. Anomalies do not have static weighting in calculating entity Risk Score

2. A Risk Score is bound to the {0 … 100} range

3. Calculation is context-based and no single action can exert undue influence on entity Risk Score increase

Risk Score Characteristics

34

Others

User behavior that has “unusual” characteristics gets assigned a static value

5 points: An event after pre-defined working hours login

15 points: Moving more than 250MB of data but less than 500MB of data from a pre-defined “risky” location

Interset

Working hours Have we seen this user work these hours

before? If we have seen these hours before, was it

recently or long ago? How much outside of previously observed

working hours is the event?

Amount of data moved Is this a location this user has accessed

previously? How does the amount of data moved compare

to previous volumes for self, peers, and population?

Has any user accessed this location recently?

1. No static weighting If we were to pretend that events are equivalent to anomalies…

35

Others

Either risk scoring has no upper bound or there is a max ceiling past which no additional points may be added

Interset

The risk score is squashed into a range that has 100 as an upper bound

2. A risk score is a bound rangeA user can not have a risk score above 100

36

5 + 15 + 50 + 35 + 10 + 25 = ???

Image retrieved from https://www.shmoop.com/functions-graphs-limits/horizontal-

asymptotes.html

29 April 2019

100

https://www.shmoop.com/functions-graphs-limits/horizontal-asymptotes.html

These scores only apply to Ann

A Risk Score is a bound range

Ann Funderburk works at an unusual hour…

… connects via VPN from China

… and accesses repositories that she and her peers do not usually access

… and takes from a folder on a repository an unusual number of times

… and moves a significantly high volume of data than normal

100

0

This allows for comparison

Change over time

Other entities

15

46

65

80

97

Risk Score

= 9737©2019 Micro Focus

Another user with same anomaly types is likely to have different scores

3. Calculation is context-basedInterset risk scores are not step functions and they build-in the concept of “decay” over time

38

Note it took a number of actions against an

already elevated risk profile to push Jacob to a

new peak risk score.

Risk score does not immediately return to zero just because of

the absence of anomalies; this is the concept of controlled

decay.

Just because the alerts in this period of were “high risk,” there was not an

automatic push for the entity risk score itself to move into a “high risk” range

Big DataArchitecture

Interset conceptual data flow

Data Stores / Printers

Endpoint Agents

Cloud Services

Firewall / Proxy

Security D

ata

Lake

(Integ

rated

)

Behavioral Analytics

Da

shb

oa

rd &

Ha

nd

-off

Orchestration / Automation

OpenDXL

Case Mgmnt / Svc Desk

REST API

Business Systems

Authentication Sources

SIEM

Email / SMS

Acquire“Which things matter?”

Bring logs and streaming sources together

Baseline“What is normal?”

Incorporate the patterns of behavior that make each entity like (and unlike) others

Detect“Where are the risks?”

Principled analytical methods surface quantified potential threats

Respond“Who takes action?”

Predetermined or ad hoc; automated or manual


The analytical pipeline

Acquire Respond

Analytics

Baseline Score


Components by role


Interset node architecture

Ambari

Stream

Master

Compute

Reporting

Search

kibana


Interset node architecture

Ambari Master

Stream (& Ingest) Compute Search

Reporting

kibana

Deploys and manages Hadoop® services on other nodes

Manages work with Apache Zookeeper and maintains “Master” nodes for Hadoop® services

The Big Data node (“Baseline” and “Detect”) that performs the analytics

Maintains calculated analytic information; accessed by Reporting node

Serves UI and handles outbound traffic (“Respond”)

Handles ingest (“Acquire”) processes with NiFi and Kafka


Interset node architecture on CDH

Management

Stream

Master

Compute

Reporting

Search

kibanamanager


Interset component data flow

(Da

shb

oa

rd)


ArcSight Integration

47

ArcSight: Initial integration points are ConnectorsSummer 2019 delivery

ArcSight Connectors

ServerSecurity device

Network Hardware

Scanner Application

Event Data Source Destination(s)

For supported data types

FlexConnector

SmartConnector

ArcSight: Enhance and combine use cases with anomaly findings

Behavioral Analytics

Da

shb

oa

rd &

Ha

nd

-off

Make use cases

smarter

Enrich event inspector details

ArcSight – Interset: integration usecase (one of)Not official, my personal vision =)

ArcSight Connectors

ServerSecurity device

Network Hardware

Scanner Application

Event Data Source CORRE

Real-time Rules

LW Rule on Risky Users/Entities

Risky Users/Entities List

Main correlation content

Get AL Value variable

Incident based on the

correlation login and

anomaly in behaviour

Interset on Vertica2020 roadmap

Interset analytics running on Vertica

Analytical results pushed to Vertica

Auth models ported to Vertica analytical pipeline

Web Proxy models ported to Vertica analytical pipelineMFS

Interset

Available Resources

52

Available Resources

Links / webinars / demo

Interset website: https://interset.com/

Interset Webinars: https://interset.com/research/webinars/

BrightTalk webinars: https://www.brighttalk.com/search/?q=interset

Interset Demo link: https://esprit.interset.com/

53

https://interset.com/

https://interset.com/research/webinars/

https://www.brighttalk.com/search/?q=interset

https://esprit.interset.com/

Thanks

54

Documents

Belgrade summit May 13 2019 - Fundorfinamicrofocus.fundorfina.pl/wp-content/uploads/2019/06/Interset-Q2-21019.pdfUnsupervised machine learning doesn’t need speciﬁc training data