Upload
others
View
50
Download
0
Embed Size (px)
Citation preview
Detecting Anomalies in Streaming Graphs
Nina MishraDhivya Eswaran Christos Faloutsos Sudipto Guha
SpotLight
Carnegie Mellon University Amazon
This work was performed at Amazon.
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Graphs are being created everywhere
�2
INTRODUCTION
You Alice
6 Jun 2018, 1.34am
………
………
………
………………
………
………………
………
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Many other settings…
�3
INTRODUCTION
IM/e-mail networks Computer networks
Transportation networks Edit networks
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
As a sequence of graph snapshots
�4
INTRODUCTION
time
Monday PM Tuesday PM
Monday AM Tuesday AM Wednesday AMMORNINGS
NIGHTS
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
But sometimes unusual events happen
�5
INTRODUCTION
NormalTax scamNetwork failure
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Unusual events in other settings
�6
INTRODUCTION
Computer networks (e.g., port scans,
denial-of-service)Transportation networks (events/weather)
stadium
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
How do we detect such anomalies in streaming graphs?
�7
INTRODUCTION
How do we even characterize these anomalies?
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Anomalies tend to involve…
�9
INSIGHT
sudden (dis)appearance of large dense directed subgraph
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
sudden (dis)appearance of large dense directed subgraph
�10
INSIGHT
sourcessources
destinationsdestinations
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �11
INSIGHT
sudden (dis)appearance of large dense directed subgraph
sources
destinationsmany nodes
many many edges
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �12
INSIGHT
sudden (dis)appearance of large dense directed subgraph
steady evolution?
suddeninitial final
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �13
INSIGHT
appearance disappearance
sudden (dis)appearance of large dense directed subgraph
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �15
PROBLEM
time
anomaly!Ok! Ok!Ok!
• (Un)directed weighted edges • Time-evolving node set • Known node-correspondence
STREAMING MODEL
• Real-time and fast detection • Bounded working memory
ALGORITHMIC CONSTRAINTS
GIVEN
FIND
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Overview of SpotLight
�17
ALGORITHM
Graph
Sketching
v(G3)
v(G1)
v(G2) v(G4)
G1
G3 G4
G2
anomaly! v(G3)
v(G1)
v(G2) v(G4)
Anomaly
Detection
Many off-the-shelf methods for anomaly detection:
‣ Robust Random Cut Forests [Guha, Mishra, Roy & Schrijvers; ICML 2016]
‣ Light-weight Online Detector of Anomalies [Pevny; ML 2016]
‣ Randomized Space Forests [Wu, Zhang, Fan, Edwards & Yu; ICDM 2014]
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0 100
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0 100 20
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0 100 20
THREE PARAMETERS:
‣ Probability of sampling source ‘p’ ‣ Probability of sampling destination ‘q’ ‣ Number of sketching dimensions ‘K’
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
time5pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 0
ahS hS hS
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 0
ahS hS hS
bhD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 1
ahS hS hS
bhD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm
0 0 1
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm
0 0 1
bhS hS hS
chD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm
0 2 3
bhS hS hS
chD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
0 2 3
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
5-6pm
0 2 3
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
5-6pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
a
d2
a
a1
b
c1
7pm
5-6pm
1 0 2
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
a
d2
a
a1
b
c1
7pm
5-6pm 6-7pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
a
d2
a
a1
b
c1
7pm
5-6pm 6-7pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Intuition behind our theorems
�21
GUARANTEES
G GBGR
v(GR)
v(GB)
K-dim SpotLight Space
v(G)dR
dB dR - dB > O(K m2)
Deterministic Experiment: Add ‘m’ unit-weight edges.
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Thm 1: Focus-awareness in expectation
�22
GUARANTEES
<
GGR GB
Randomized Experiment: Add ‘m’ unit-weight edges uniformly at random.
K-dim SpotLight Space
dR
dB
distance
proba
bility
E[dB]
Focus-awareness property was introduced by Koutra, Vogelstein & Faloutsos [SDM 2013].
E[dR]
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Thm 2: Criterion for anomaly detection
�23
GUARANTEES
distance
proba
bility dR dB
FN FP
decision thresholdanomalynormal
distancepro
babil
ity
dR dB
FPR ≤ 𝛅
𝛜
➡ Pr[dR-dB > 𝛜] ≥ 1-𝛅
“EXPECTED” GAP “HIGH PROBABILITY” GAP
sketch size, K ≥ K*
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
The labeled DARPA dataset
�25
EXPERIMENTS
4.5M edges in 87.7K time ticks 9.5K sources, 24K destinations Edges labeled as attack/not
Stream of 1.5K hourly graphs(24% anomalous)
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
DARPA: Precision and recall
�26
EXPERIMENTS
#graphs correctly flagged
#graphs flaggedPrecision =
#graphs correctly flagged
#anomalous graphsRecall =
RHSS: (Ranshous, Harenburg, Sharma & Samatova, SDM 2016)STA: Streaming Tensor Analysis (Sun, Tao & Faloutsos, KDD 2006)
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
DARPA: Challenges and successes
�27
EXPERIMENTS
SpotLight
Edge Weight = SL with K=p=q=1 (+misses medium size attacks)
(misses small attacks)
RHSS = Edge likelihood function (+misses repeated attacks)
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Summary
29
CONCLUSION
Memory efficient Theoretical guaranteesReal-time
Ok!
anomaly!
Ok! Ok! time
PROBLEM
SpotLight sketching
SOLUTION
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Future directions
�30
CONCLUSION
MORE CHALLENGING ANOMALIES
‣ Slow and/or small attacks
‣ Sequence of suspicious events rather than a single event
STREAMING ANOMALY ATTRIBUTION
‣ Blame a small set of sources and destinations for the anomaly
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 31
CONCLUSION
Thank you! Questions:
distance
proba
bility
dR dB
FPR ≤ 𝛅
𝛜
ALGORITHM THEORY PRACTICE