Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CMU SCS
Anomaly Detection in Large Graphs
Christos Faloutsos CMU
CMU SCS
Thank you!
Prof. David Brumley
Megan Kearns
CyLab, CMU (c) 2016, C. Faloutsos 2
CMU SCS
(c) 2016, C. Faloutsos 3
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 4
Graphs - why should we care?
Internet Map [lumeta.com]
CyLab, CMU
computer network security: • Email traffic • IP traffic (src, dst, dst-port, t)
Malware propagation • (machine-id, infected-file-id)
CMU SCS
(c) 2016, C. Faloutsos 5
Graphs - why should we care?
>$10B; ~1B users
CyLab, CMU
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
CyLab, CMU (c) 2016, C. Faloutsos 8
time
destination
CMU SCS
CyLab, CMU (c) 2016, C. Faloutsos 11
Part 1: Patterns, &
fraud detection
CMU SCS
(c) 2016, C. Faloutsos 12
Laws and patterns • Q1: Are real graphs random?
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 13
Laws and patterns • Q1: Are real graphs random? • A1: NO!!
– Diameter (‘6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns
• So, let’s look at the data
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 23
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 24
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles – Friends of friends are friends
• Any patterns? – 2x the friends, 2x the triangles ?
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 25
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
SN Reuters
Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles
CyLab, CMU
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
26 CyLab, CMU 26 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
27 CyLab, CMU 27 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
28 CyLab, CMU 28 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
29 CyLab, CMU 29 (c) 2016, C. Faloutsos
CMU SCS
(c) 2016, C. Faloutsos 55
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions
CyLab, CMU
CMU SCS
CyLab, CMU (c) 2016, C. Faloutsos 56
Part 2: Time evolving
graphs; tensors
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 57
smith
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 58
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 59
Mon Tue
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 60 callee
caller
CMU SCS
Graphs over time -> tensors! • Problem #2’:
– Given author-keyword-date – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 61 keyword
author
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2’’:
– Given subject – verb – object facts – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 62 object
subject
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2’’’:
– Given <triplets> – Find patterns / anomalies
CyLab, CMU (c) 2016, C. Faloutsos 63 mode2
mode1
MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)
CMU SCS
(c) 2016, C. Faloutsos 64
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– Intro to tensors – Results – Speed
• Conclusions
CyLab, CMU
CMU SCS Answer to both: tensor
factorization • Recall: (SVD) matrix factorization: finds
blocks
CyLab, CMU (c) 2016, C. Faloutsos 65
N users
M products
‘meat-eaters’ ‘steaks’
‘vegetarians’ ‘plants’
‘kids’ ‘cookies’
~ + +
CMU SCS Answer to both: tensor
factorization • PARAFAC decomposition
CyLab, CMU (c) 2016, C. Faloutsos 66
= + + subject
object
politicians artists athletes
CMU SCS
Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when
– 4M x 15 days
CyLab, CMU (c) 2016, C. Faloutsos 67
= + + caller
callee
?? ?? ??
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day!
1 caller 5 receivers 4 days of activity
CyLab, CMU 68 (c) 2016, C. Faloutsos
=
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day! CyLab, CMU 69 (c) 2016, C. Faloutsos
=
1 caller 5 receivers 4 days of activity
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day! CyLab, CMU 70 (c) 2016, C. Faloutsos
=
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
CMU SCS
(c) 2016, C. Faloutsos 76
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– Inter-arrival time patterns • Acknowledgements and Conclusions
CyLab, CMU
CMU SCS
RSC: Mining and Modeling Temporal Activity in Social Media
Alceu F. Costa* Yuto Yamaguchi Agma J. M. Traina
Caetano Traina Jr. Christos Faloutsos
Universidade de São Paulo
KDD 2015 – Sydney, Australia
CMU SCS
Reddit Dataset Time-stamp from comments 21,198 users 20 Million time-stamps
Twitter Dataset Time-stamp from tweets 6,790 users 16 Million time-stamps
Pattern Mining: Datasets
For each user we have: Sequence of postings time-stamps: T = (t1, t2, t3, …) Inter-arrival times (IAT) of postings: (∆1, ∆2, ∆3, …)
78 t1 t2 t3 t4
∆1 ∆2 ∆3
time CyLab, CMU (c) 2016, C. Faloutsos
CMU SCS
Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed
Users can be inactive for long periods of time before making new postings
IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis)
79
105 106 10710 5
100
, IAT (seconds)
CC
DF,
P(x
)
1 day 10 days105 106 107
10 5
100
, IAT (seconds)
CC
DF,
P(x
)
1 day10 days
Reddit Users Twitter Users CyLab, CMU (c) 2016, C. Faloutsos
CMU SCS
Pattern Mining Pattern 2: Bimodal IAT distribution
Users have highly active sections and resting periods
Log-binned histogram of postings IAT
80 Twitter Users
102 104 1060
0.005
0.01
0.015
, IAT (seconds)
1st Mode (1min) 2nd Mode (3h)
CyLab, CMU (c) 2016, C. Faloutsos
CMU SCS
Human? Robots?
log
linear
CyLab, CMU (c) 2016, C. Faloutsos 81
CMU SCS
Human? Robots?
log
linear 2’ 3h 1day
CyLab, CMU (c) 2016, C. Faloutsos 82
CMU SCS
Experiments: Can RSC-Spotter Detect Bots?
Precision vs. Sensitivity Curves Good performance: curve close to the top
83
0 0.2 0.4 0.6 0.8 10
0.20.40.60.8
1
Sensitivity (Recall)
Prec
isio
n
0 0.2 0.4 0.6 0.8 10
0.20.40.60.8
1
Sensitivity (Recall)
Prec
isio
n
RSC Spotter
IAT Hist.Entropy [6]Weekday Hist.
All Features
Precision > 94% Sensitivity >
70%
With strongly imbalanced
datasets
# humans >> # bots
CyLab, CMU (c) 2016, C. Faloutsos
CMU SCS
Experiments: Can RSC-Spotter Detect Bots?
Precision vs. Sensitivity Curves Good performance: curve close to the top
84
0 0.2 0.4 0.6 0.8 10
0.20.40.60.8
1
Sensitivity (Recall)
Prec
isio
n
RSC Spotter
IAT Hist.Entropy [6]Weekday Hist.
All Features
Precision > 96% Sensitivity >
47%
With strongly imbalanced
datasets
# humans >> # bots
CyLab, CMU (c) 2016, C. Faloutsos
CMU SCS
(c) 2016, C. Faloutsos 87
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Acknowledgements and Conclusions
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 88
Thanks
CyLab, CMU
Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
CMU SCS
(c) 2016, C. Faloutsos 89
Cast
Akoglu, Leman
Chau, Polo
Kang, U
Hooi, Bryan
CyLab, CMU
Koutra, Danai
Beutel, Alex
Papalexakis, Vagelis
Shah, Neil
Araujo, Miguel
Song, Hyun Ah
Eswaran, Dhivya
Shin, Kijung
CMU SCS
(c) 2016, C. Faloutsos 90
CONCLUSION#1 – Big data • Patterns Anomalies
• Large datasets reveal patterns/outliers that are invisible otherwise
CyLab, CMU
CMU SCS
(c) 2016, C. Faloutsos 91
CONCLUSION#2 – tensors
• powerful tool
CyLab, CMU
=
1 caller 5 receivers 4 days of activity
CMU SCS
(c) 2016, C. Faloutsos 92
References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/
S00449ED1V01Y201209DMK006
CyLab, CMU
CMU SCS
TAKE HOME MESSAGE:
Cross-disciplinarity
CyLab, CMU (c) 2016, C. Faloutsos 93
=
CMU SCS
Cross-disciplinarity
CyLab, CMU (c) 2016, C. Faloutsos 94
=
Thank you!