25
Scalable Anomaly Ranking of Attributed Neighborhoods SDM May 5 th , 2016 Bryan Perozzi, Leman Akoglu Stony Brook University SDM 2016: Best Paper Runner-up Award!

Scalable Anomaly Ranking of Attributed Neighborhoods

Embed Size (px)

Citation preview

Page 1: Scalable Anomaly Ranking of Attributed Neighborhoods

Scalable Anomaly Ranking of Attributed Neighborhoods

SDMMay 5th, 2016

Bryan Perozzi, Leman AkogluStony Brook University

SDM 2016: Best Paper Runner-up Award!

Page 2: Scalable Anomaly Ranking of Attributed Neighborhoods

What’s an Anomaly, Anyhow? Given an attributed subgraph how to

quantify its quality? Structure Only

Internal Measures Average Degree

Internal

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 3: Scalable Anomaly Ranking of Attributed Neighborhoods

What’s an Anomaly, Anyhow? Given an attributed subgraph how to

quantify its quality? Structure Only

Internal Measures Average Degree

Boundary Cut Edges

Internal + Boundary Conductance

Internal

External

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 4: Scalable Anomaly Ranking of Attributed Neighborhoods

What’s an Anomaly, Anyhow? Given an attributed subgraph how to

quantify its quality? Structure Only

Internal Measures Average Degree

Boundary Cut Edges

Internal + Boundary Conductance

Structure + Attributes SODA [Gupta+, 14] Attributed Weighted Normalized Cut [Gunnermann+,13]

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 5: Scalable Anomaly Ranking of Attributed Neighborhoods

Outline Problem of Anomaly Ranking Metric: Normality Optimizing Normality Experimental Results Understanding Graphs with Normality

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 6: Scalable Anomaly Ranking of Attributed Neighborhoods

Normality (intuition)

high

low

Given an attributed subgraph how to quantify quality?

Internal structural density

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 7: Scalable Anomaly Ranking of Attributed Neighborhoods

Normality (intuition)

high

low

chess biking

Given an attributed subgraph how to quantify quality?

Internal structural density AND attribute coherence

neighborhood “focus”

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 8: Scalable Anomaly Ranking of Attributed Neighborhoods

Normality (intuition) Given an attributed subgraph

how to quantify quality? Internal

structural density AND attribute coherence

neighborhood “focus” Boundary

structural sparsity, OR external separation

“exoneration”

high

lowBryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 9: Scalable Anomaly Ranking of Attributed Neighborhoods

Motivation: no good cuts in real-world graphs social circles overlap

“Exoneration”: by (a) null model, (b) attributes

Normality (intuition)[Leskovec+ ‘08]

[McAuley+ ‘14]

(b) neighborhood overlap(a) hub effect

edges expected,not surprising

separable bydifferent “focus”

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 10: Scalable Anomaly Ranking of Attributed Neighborhoods

The measure of Normality

1

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 11: Scalable Anomaly Ranking of Attributed Neighborhoods

The measure of Normality

1internal

consistency

Null model

dot-product, orKronecker’s “focus” vector

chess biking

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 12: Scalable Anomaly Ranking of Attributed Neighborhoods

The measure of Normality

1external

separability

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 13: Scalable Anomaly Ranking of Attributed Neighborhoods

Anomaly Mining of Entity Neighborhoods (AMEN)

Given a community, can we find the weights which maximize its normality?

2

1

latent

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 14: Scalable Anomaly Ranking of Attributed Neighborhoods

Optimizing Normality

2

1

3

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 15: Scalable Anomaly Ranking of Attributed Neighborhoods

Optimizing Normality

: one attribute f with largest x

x

: all f with positive x

Normality becomes

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 16: Scalable Anomaly Ranking of Attributed Neighborhoods

Size Invariant Scoring So far: Normality of a community grows with

size Bad for comparisons

Need to “Normalize” normality for ranking

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 17: Scalable Anomaly Ranking of Attributed Neighborhoods

Illustrative examplessplit-radix FFTtelescopic op-amps

telescopic cascodemultidecade

… …

reciprocal splitreserve

… …

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 18: Scalable Anomaly Ranking of Attributed Neighborhoods

Example neighborhoods

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 19: Scalable Anomaly Ranking of Attributed Neighborhoods

Synthetic Anomaly Detection

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 20: Scalable Anomaly Ranking of Attributed Neighborhoods

Normality vs Conductance, DBLP

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 21: Scalable Anomaly Ranking of Attributed Neighborhoods

Normality vs Conductance, Google+

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 22: Scalable Anomaly Ranking of Attributed Neighborhoods

Normality Distribution

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 23: Scalable Anomaly Ranking of Attributed Neighborhoods

Feature distribution, DBLP

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 24: Scalable Anomaly Ranking of Attributed Neighborhoods

Feature distribution, LastFM

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Page 25: Scalable Anomaly Ranking of Attributed Neighborhoods

Thanks! Any questions?

Bryan Perozzi

Papers, Code, Contact Info:

www.perozzi.net

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods