84
CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

Embed Size (px)

Citation preview

Page 1: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Identifying on-line Fraudsters: Anomaly Detection Using

Network Effects

Christos Faloutsos

CMU

Page 2: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Thanks

• Saman Haqqi

IBM-PBGH June 2013 C. Faloutsos (CMU) 2

Page 3: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 3

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling– C1: spikeM model

• Conclusions

IBM-PBGH June 2013

Page 4: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 4

E-bay Fraud detection

w/ Polo Chau &Shashank Pandit, CMU[www’07]

Page 5: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 5

E-bay Fraud detection

Page 6: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 6

E-bay Fraud detection

Page 7: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 7

E-bay Fraud detection - NetProbe

Page 8: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 8

E-bay Fraud detection - NetProbe

F A H

F 99%

A 99%

H 49% 49%

Compatibilitymatrix

heterophily

details

Page 9: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 9

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

~bi (xi )

Page 10: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 10

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

~bi (xi )

F A H

F 99%

A 99%

H 49% 49%

Page 11: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Popular press

And less desirable attention:• E-mail from ‘Belgium police’ (‘copy of

your code?’)

IBM-PBGH June 2013 C. Faloutsos (CMU) 11

Page 12: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 12

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 13: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Polo ChauMachine Learning Dept

Carey NachenbergVice President & Fellow

Jeffrey WilhelmPrincipal Software Engineer

Adam WrightSoftware Engineer

Prof. Christos FaloutsosComputer Science Dept

Polonium: Tera-Scale Graph Mining and Inference for Malware Detection

PATENT PENDING

SDM 2011, Mesa, Arizona

Page 14: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Polonium: The Data60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program

50+ million machines900+ million executable files

Constructed a machine-file bipartite graph (0.2 TB+)

1 billion nodes (machines and files)37 billion edges

IBM-PBGH June 2013 14C. Faloutsos (CMU)

Page 15: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Polonium: Key Ideas• Use “guilt-by-association” (i.e., homophily)

– E.g., files that appear on machines with many bad files are more likely to be bad

• Scalability: handles 37 billion-edge graph

IBM-PBGH June 2013 15C. Faloutsos (CMU)

Page 16: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Polonium: One-Interaction Results

84.9% True Positive Rate1% False Positive Rate

True Positive Rate% of malware

correctly identified

False Positive Rate% of non-malware wrongly labeled as malware16

Ideal

IBM-PBGH June 2013 C. Faloutsos (CMU)

Page 17: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 17

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 18: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Unifying Guilt-by-Association Approaches:

Theorems and Fast Algorithms

Danai Koutra

U Kang

Hsing-Kuo Kenneth Pao

Tai-You Ke

Duen Horng (Polo) Chau

Christos Faloutsos

ECML PKDD, 5-9 September 2011, Athens, Greece

Page 19: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Problem Definition:GBA techniques

C. Faloutsos (CMU) 19

Given: Graph; & few labeled nodesFind: labels of rest(assuming network effects)

?

?

?

?

IBM-PBGH June 2013

Page 20: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Homophily and Heterophily

C. Faloutsos (CMU) 20

Step 1

Step 2

homophily heterophily

All methods handle

homophily

NOT all methods handle

heterophily

BUT

proposed method

does!

IBM-PBGH June 2013

Page 21: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Are they related?• RWR (Random Walk with Restarts)

– google’s pageRank (‘if my friends are important, I’m important, too’)

• SSL (Semi-supervised learning) – minimize the differences among neighbors

• BP (Belief propagation) – send messages to neighbors, on what you

believe about them

IBM-PBGH June 2013 C. Faloutsos (CMU) 21

Page 22: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Are they related?• RWR (Random Walk with Restarts)

– google’s pageRank (‘if my friends are important, I’m important, too’)

• SSL (Semi-supervised learning) – minimize the differences among neighbors

• BP (Belief propagation) – send messages to neighbors, on what you

believe about them

IBM-PBGH June 2013 C. Faloutsos (CMU) 22

YES!

Page 23: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 23

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

Page 24: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Correspondence of Methods

C. Faloutsos (CMU) 24

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)y

SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

d1

d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

IBM-PBGH June 2013

Page 25: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Correspondence of Methods

C. Faloutsos (CMU) 25

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)y

SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

d1

d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

IBM-PBGH June 2013

We know when it converges!

Page 26: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Results: Scalability

C. Faloutsos (CMU) 26

FABP is linear on the number of edges.

# of edges (Kronecker graphs)

run

tim

e (m

in)

IBM-PBGH June 2013

Page 27: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Results: Parallelism

C. Faloutsos (CMU) 27

FABP ~2x faster & wins/ties on accuracy.

runtime (min)

% a

ccu

racy

IBM-PBGH June 2013

Page 28: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 28

Conclusions for BP

• ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects.

• FaBP: fast & accurate (and -> convergence conditions)

IBM-PBGH June 2013

Page 29: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 29

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 30: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes

B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU) 30IBM-PBGH June 2013

Page 31: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

31C. Faloutsos (CMU)IBM-PBGH June 2013

Page 32: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

32C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 33: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

33C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 34: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

34C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 35: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

35C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 36: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• EE plot:• Scatter plot of

scores of u1 vs u2• One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 36

u1

u2

IBM-PBGH June 2013

1st Principal component

2nd Principal component

Page 37: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes• EE plot:• Scatter plot of

scores of u1 vs u2• One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 37

u1

u290o

IBM-PBGH June 2013

Page 38: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes - pervasiveness

•Present in mobile social graph across time and space

•Patent citation graph

38C. Faloutsos (CMU)IBM-PBGH June 2013

Page 39: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

39C. Faloutsos (CMU)IBM-PBGH June 2013

Page 40: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

40C. Faloutsos (CMU)IBM-PBGH June 2013

Page 41: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

41C. Faloutsos (CMU)IBM-PBGH June 2013

Page 42: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

42C. Faloutsos (CMU)IBM-PBGH June 2013

Page 43: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So what? Extract nodes with high

scores high connectivity Good “communities”

spy plot of top 20 nodes

43C. Faloutsos (CMU)IBM-PBGH June 2013

Page 44: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Bipartite Communities!

magnified bipartite community

patents fromsame inventor(s)

`cut-and-paste’bibliography!

44C. Faloutsos (CMU)IBM-PBGH June 2013

Page 45: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

(maybe, botnets?)

Victim IPs?

Botnet members?

45C. Faloutsos (CMU)IBM-PBGH June 2013

Exploring itwith Dr. Eric Mao (III-Taiwan)

Page 46: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 46

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 47: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

GigaTensor: Scaling Tensor Analysis Up By 100 Times –

Algorithms and Discoveries

U Kang

ChristosFaloutsos

KDD’12

EvangelosPapalexakis

AbhayHarpale

IBM-PBGH June 2013 47C. Faloutsos (CMU)

Page 48: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Hyperlinks &anchor text [Kolda+,05]

URL 1

URL 2

Anchor Text

Java

C++

C#

11

1

1

1

1 1

IBM-PBGH June 2013 48C. Faloutsos (CMU)

java

Page 49: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

“Barack Obama is president of U.S.”

“Eric Clapton playsguitar”

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

IBM-PBGH June 2013 49C. Faloutsos (CMU)

Page 50: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

IBM-PBGH June 2013 50C. Faloutsos (CMU)IP-destination

IP-source

Time-stamp Anomaly Detection inComputernetworks

Page 51: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Problem Definition

• How to decompose a billion-scale tensor?– Corresponds to SVD in 2D case

IBM-PBGH June 2013 51C. Faloutsos (CMU)

Page 52: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Problem Definition

• How to decompose a billion-scale tensor?– Corresponds to SVD in 2D case

IBM-PBGH June 2013 52C. Faloutsos (CMU)

‘Politicians’ ‘Artists’

Page 53: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Problem Definition

Q1: Dominant concepts/topics? Q2: Find synonyms to a given noun phrase? (and how to scale up: |data| > RAM)

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

IBM-PBGH June 2013 53C. Faloutsos (CMU)

Page 54: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Experiments

• GigaTensor solves 100x larger problem

Number of nonzero= I / 50

(J)

(I)

(K)

GigaTensor

Tensor

Toolbox Out ofMemory

100x

IBM-PBGH June 2013 54C. Faloutsos (CMU)

Page 55: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

A1: Concept Discovery

• Concept Discovery in Knowledge Base

IBM-PBGH June 2013 55C. Faloutsos (CMU)

Page 56: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

A1: Concept Discovery

IBM-PBGH June 2013 56C. Faloutsos (CMU)

Page 57: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

A2: Synonym Discovery

IBM-PBGH June 2013 57C. Faloutsos (CMU)

Page 58: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 58

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 59: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Rise and Fall Patterns of Information Diffusion:Model and Implications

Yasuko Matsubara (Kyoto University),

Yasushi Sakurai (NTT), B. Aditya Prakash (CMU),

Lei Li (UCB), Christos Faloutsos (CMU)

KDD’12, Beijing China

Page 60: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008

60

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

IBM-PBGH June 2013

Page 61: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

61

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

IBM-PBGH June 2013

Page 62: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

62

• Can we find a unifying model, which includes these patterns?

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

IBM-PBGH June 2013

Page 63: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

63

• Answer: YES!

• We can represent all patterns by single model

IBM-PBGH June 2013

Page 64: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 64

Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)

- 2. External shock at time nb (e.g, breaking news)

- 3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

IBM-PBGH June 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function

Time n=nb+1

Page 65: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 65

- 1. Un-informed bloggers (uninformed about rumor)

- 2. External shock at time nb (e.g, breaking news)

- 3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

IBM-PBGH June 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function

Time n=nb+1

Main idea - SpikeM

Page 66: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 66

-1.5 slope

J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Response time (log)

Prob(RT > x)(log) -1.5

Page 67: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

SpikeM - with periodicity• Full equation of SpikeM

67

Periodicity

noonPeak 3am

Dip

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Details

IBM-PBGH June 2013

Page 68: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

Details• Analysis – exponential rise and power-raw fall

68

Lin-log

Log-log

Rise-part

SI -> exponential SpikeM -> exponential

IBM-PBGH June 2013

Page 69: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

Details• Analysis – exponential rise and power-raw fall

69

Lin-log

Log-log

Fall-part

SI -> exponential SpikeM -> power law

IBM-PBGH June 2013

Page 70: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

Tail-part forecasts

70

• SpikeM can capture tail part

IBM-PBGH June 2013

Page 71: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

“What-if” forecasting

71

e.g., given (1) first spike,

(2) release date of two sequel movies

(3) access volume before the release date

?

(1) First spike

(2) Release date

(3) Two weeks before release

IBM-PBGH June 2013

?

Page 72: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU)

“What-if” forecasting

72SpikeM can forecast upcoming spikes

(1) First spike

(2) Release date

(3) Two weeks before release

IBM-PBGH June 2013

Page 73: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Conclusions for spikes• Exp rise; PL decay• ‘spikeM’ captures all patterns, with a few

parms– And can do extrapolation– And forecasting

IBM-PBGH June 2013 C. Faloutsos (CMU) 73

Page 74: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 74

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Future research• Conclusions

IBM-PBGH June 2013

Page 75: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Challenge#1: Time evolving networks / tensors

• Periodicities? Burstiness?• What is ‘typical’ behavior of a node, over time• Heterogeneous graphs (= nodes w/ attributes)

IBM-PBGH June 2013 C. Faloutsos (CMU) 75

Page 76: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Challenge #2: ‘Connectome’ – brain wiring

IBM-PBGH June 2013 C. Faloutsos (CMU) 76

• Which neurons get activated by ‘bee’• How wiring evolves• Modeling epilepsy

N. Sidiropoulos

George Karypis

V. Papalexakis

Tom Mitchell

Page 77: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 77

Thanks

IBM-PBGH June 2013

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

Page 78: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 78

Project info: PEGASUS

IBM-PBGH June 2013

www.cs.cmu.edu/~pegasusResults on large graphs: with Pegasus +

hadoop + M45

Apache license

Code, papers, manual, video

Prof. U Kang Prof. Polo Chau

Page 79: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 79

Cast

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tong, Hanghang

Prakash,Aditya

IBM-PBGH June 2013

Koutra,Danai

Beutel,Alex

Papalexakis,Vagelis

Page 80: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 80

References

• Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

IBM-PBGH June 2013

Page 81: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 81

References• Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

IBM-PBGH June 2013

Page 82: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

References• Yasuko Matsubara, Yasushi Sakurai, B. Aditya

Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012

IBM-PBGH June 2013 C. Faloutsos (CMU) 82

Page 83: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

References• Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

IBM-PBGH June 2013 C. Faloutsos (CMU) 83

Page 84: CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

CMU SCS

Overall Conclusions• G1: fraud detection

– BP: powerful method– FaBP: faster; equally accurate; known

convergence

• G2: botnets -> Eigenspokes• G3: Subject-Verb-Object ->

Tensors/GigaTensor• Spikes: ‘spikeM’ (exp rise; PL drop)

IBM-PBGH June 2013 C. Faloutsos (CMU) 84