102
Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Embed Size (px)

Citation preview

Page 1: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Anomaly Detection and Virus Propagation in

Large GraphsChristos Faloutsos

CMU

Page 2: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Thank you!

• Dr. Ching-Hao (Eric) Mao

• Prof. Kenneth Pao

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 2

Page 3: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Faloutsos, Prakash, Chau, Koutra, Akoglu 3

Outline

• Part 1: anomaly detection– OddBall (anomaly detection)– Belief Propagation– Conclusions

• Part 2: influence propagation

Taiwan'12

Page 4: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

OddBall: Spotting Anomalies in Weighted Graphs

Leman Akoglu, Mary McGlohon, Christos Faloutsos

Carnegie Mellon University School of Computer Science

PAKDD 2010, Hyderabad, India

Page 5: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Main idea

For each node, • extract ‘ego-net’ (=1-step-away neighbors)• Extract features (#edges, total weight, etc etc)• Compare with the rest of the population

Faloutsos, Prakash, Chau, Koutra, Akoglu 5Taiwan'12

Page 6: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

What is an egonet?

ego

6

egonet

Faloutsos, Prakash, Chau, Koutra, AkogluTaiwan'12

Page 7: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Selected Features Ni: number of neighbors (degree) of ego i Ei: number of edges in egonet i Wi: total weight of egonet i λw,i: principal eigenvalue of the weighted adjacency matrix of

egonet I

7Faloutsos, Prakash, Chau, Koutra, AkogluTaiwan'12

Page 8: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Near-Clique/Star

8Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 9: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Near-Clique/Star

9Faloutsos, Prakash, Chau, Koutra, AkogluTaiwan'12

Page 10: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Near-Clique/Star

10Faloutsos, Prakash, Chau, Koutra, AkogluTaiwan'12

Page 11: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Andrew Lewis (director)

Near-Clique/Star

11Faloutsos, Prakash, Chau, Koutra, AkogluTaiwan'12

Page 12: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Faloutsos, Prakash, Chau, Koutra, Akoglu 12

Outline

• Part 1: anomaly detection– OddBall (anomaly detection)– Belief Propagation– Conclusions

• Part 2: influence propagation

Taiwan'12

Page 13: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 13

E-bay Fraud detection

w/ Polo Chau &Shashank Pandit, CMU[www’07]

Page 14: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 14

E-bay Fraud detection

Page 15: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 15

E-bay Fraud detection

Page 16: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 16

E-bay Fraud detection - NetProbe

Page 17: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Popular press

And less desirable attention:• E-mail from ‘Belgium police’ (‘copy of your

code?’)

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 17

Page 18: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Faloutsos, Prakash, Chau, Koutra, Akoglu 18

Outline

• OddBall (anomaly detection)• Belief Propagation

– Ebay fraud– Symantec malware detection– Unification results

• Conclusions

Taiwan'12

Page 19: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Polo ChauMachine Learning Dept

Carey NachenbergVice President & Fellow

Jeffrey WilhelmPrincipal Software Engineer

Adam WrightSoftware Engineer

Prof. Christos FaloutsosComputer Science Dept

Polonium: Tera-Scale Graph Mining and Inference for Malware Detection

PATENT PENDING

SDM 2011, Mesa, Arizona

Page 20: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Polonium: The Data60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program

50+ million machines900+ million executable files

Constructed a machine-file bipartite graph (0.2 TB+)

1 billion nodes (machines and files)37 billion edges

Taiwan'12 20Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 21: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Polonium: Key Ideas

• Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware

• Use “guilt-by-association” (i.e., homophily)– E.g., files that appear on machines with many bad

files are more likely to be bad• Scalability: handles 37 billion-edge graph

Taiwan'12 21Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 22: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Polonium: One-Interaction Results

84.9% True Positive Rate1% False Positive Rate

True Positive Rate% of malware correctly identified

False Positive Rate% of non-malware wrongly labeled as malware 22

Ideal

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 23: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Faloutsos, Prakash, Chau, Koutra, Akoglu 23

Outline

• Part 1: anomaly detection– OddBall (anomaly detection)– Belief Propagation

• Ebay fraud• Symantec malware detection• Unification results

– Conclusions• Part 2: influence propagation

Taiwan'12

Page 24: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Unifying Guilt-by-Association Approaches:

Theorems and Fast Algorithms

Danai KoutraU Kang

Hsing-Kuo Kenneth Pao

Tai-You KeDuen Horng (Polo) Chau

Christos Faloutsos

ECML PKDD, 5-9 September 2011, Athens, Greece

Page 25: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Problem Definition:GBA techniques

Faloutsos, Prakash, Chau, Koutra, Akoglu 25

Given: Graph; & few labeled nodesFind: labels of rest(assuming network effects)

?

?

?

?

Taiwan'12

Page 26: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Homophily and Heterophily

Faloutsos, Prakash, Chau, Koutra, Akoglu 26

Step 1

Step 2

homophily heterophily

All methods handle homophily

NOT all methods handle heterophily

BUT

proposed method does!

Taiwan'12

Page 27: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Are they related?

• RWR (Random Walk with Restarts) – google’s pageRank (‘if my friends are important,

I’m important, too’)• SSL (Semi-supervised learning)

– minimize the differences among neighbors• BP (Belief propagation)

– send messages to neighbors, on what you believe about them

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 27

Page 28: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Are they related?

• RWR (Random Walk with Restarts) – google’s pageRank (‘if my friends are important,

I’m important, too’)• SSL (Semi-supervised learning)

– minimize the differences among neighbors• BP (Belief propagation)

– send messages to neighbors, on what you believe about them

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu 28

YES!

Page 29: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Correspondence of Methods

Faloutsos, Prakash, Chau, Koutra, Akoglu 29

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)ySSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

1 1 1

d1 d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

Taiwan'12

Page 30: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Results: Scalability

Faloutsos, Prakash, Chau, Koutra, Akoglu 30

FABP is linear on the number of edges.

# of edges (Kronecker graphs)

runti

me

(min

)

Taiwan'12

Page 31: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Results (5): Parallelism

Faloutsos, Prakash, Chau, Koutra, Akoglu 31

FABP ~2x faster & wins/ties on accuracy.

runtime (min)

% a

ccur

acy

Taiwan'12

Page 32: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Faloutsos, Prakash, Chau, Koutra, Akoglu 32

Conclusions

• Anomaly detection: hand-in-hand with pattern discovery (‘anomalies’ == ‘rare patterns’)

• ‘OddBall’ for large graphs

• ‘NetProbe’ and belief propagation: exploit network effects.

• FaBP: fast & accurate

Taiwan'12

Page 33: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Faloutsos, Prakash, Chau, Koutra, Akoglu 33

Outline

• Part 1: anomaly detection– OddBall (anomaly detection)– Belief Propagation– Conclusions

• Part 2: influence propagation

Taiwan'12

Page 34: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Influence propagation in large graphs -

theorems and algorithmsB. Aditya Prakash

http://www.cs.cmu.edu/~badityap

Christos Faloutsoshttp://www.cs.cmu.edu/~christos

Carnegie Mellon University

Page 35: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook Network [2010]

The Internet [2005]

Faloutsos, Prakash, Chau, Koutra, Akoglu 35Taiwan'12

Page 36: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Dynamical Processes over networks are also everywhere!

Faloutsos, Prakash, Chau, Koutra, Akoglu 36Taiwan'12

Page 37: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Why do we care?• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology• Social Collaboration........

Faloutsos, Prakash, Chau, Koutra, Akoglu 37Taiwan'12

Page 38: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

Faloutsos, Prakash, Chau, Koutra, Akoglu 38Taiwan'12

Page 39: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks

• Each circle is a hospital• ~3000 hospitals• More than 30,000

patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?

Faloutsos, Prakash, Chau, Koutra, Akoglu 39Taiwan'12

Page 40: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Why do we care? (1: Epidemiology)

CURRENT PRACTICE OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]

Faloutsos, Prakash, Chau, Koutra, Akoglu 40Taiwan'12Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)

Page 41: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Why do we care? (2: Online Diffusion)

> 800m users, ~$1B revenue [WSJ 2010]

~100m active users

> 50m users

Faloutsos, Prakash, Chau, Koutra, Akoglu 41Taiwan'12

Page 42: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Why do we care? (2: Online Diffusion)

• Dynamical Processes over networks

Celebrity

Buy Versace™!

Followers

Social Media MarketingFaloutsos, Prakash, Chau, Koutra, Akoglu 42Taiwan'12

Page 43: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

High Impact – Multiple Settings

Q. How to squash rumors faster?

Q. How do opinions spread?

Q. How to market better?

epidemic out-breaks

products/viruses

transmit s/w patches

Faloutsos, Prakash, Chau, Koutra, Akoglu 43Taiwan'12

Page 44: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Research Theme

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging

Faloutsos, Prakash, Chau, Koutra, Akoglu 44Taiwan'12

Page 45: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

In this talk

ANALYSISUnderstanding

Given propagation models:

Q1: Will an epidemic happen?

Faloutsos, Prakash, Chau, Koutra, Akoglu 45Taiwan'12

Page 46: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

In this talk

Q2: How to immunize and control out-breaks better?

POLICY/ ACTIONManaging

Faloutsos, Prakash, Chau, Koutra, Akoglu 46Taiwan'12

Page 47: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Part 1: anomaly detection• Part 2: influence propagation

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 47Taiwan'12

Page 48: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

A fundamental questionStrong Virus

Epidemic?

Faloutsos, Prakash, Chau, Koutra, Akoglu 48Taiwan'12

Page 49: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

example (static graph)Weak Virus

Epidemic?

Faloutsos, Prakash, Chau, Koutra, Akoglu 49Taiwan'12

Page 50: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Problem Statement

Find, a condition under which– virus will die out exponentially quickly– regardless of initial infection condition

above (epidemic)

below (extinction)

# Infected

time

Separate the regimes?

Faloutsos, Prakash, Chau, Koutra, Akoglu 50Taiwan'12

Page 51: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Threshold (static version)

Problem Statement• Given:

–Graph G, and –Virus specs (attack prob. etc.)

• Find: –A condition for virus extinction/invasion

Faloutsos, Prakash, Chau, Koutra, Akoglu 51Taiwan'12

Page 52: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Threshold: Why important?

• Accelerating simulations• Forecasting (‘What-if’ scenarios)• Design of contagion and/or topology• A great handle to manipulate the spreading

– Immunization– Maximize collaboration…..

Faloutsos, Prakash, Chau, Koutra, Akoglu 52Taiwan'12

Page 53: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)

– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses

• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 53Taiwan'12

Page 54: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

“SIR” model: life immunity (mumps)

• Each node in the graph is in one of three states– Susceptible (i.e. healthy)– Infected– Removed (i.e. can’t get infected again)

Prob. β Prob. δ

t = 1 t = 2 t = 3

Background

Faloutsos, Prakash, Chau, Koutra, Akoglu 54Taiwan'12

Page 55: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Terminology: continued

• Other virus propagation models (“VPM”)– SIS : susceptible-infected-susceptible, flu-like– SIRS : temporary immunity, like pertussis– SEIR : mumps-like, with virus incubation (E = Exposed)….………….

• Underlying contact-network – ‘who-can-infect-whom’

Background

Faloutsos, Prakash, Chau, Koutra, Akoglu 55Taiwan'12

Page 56: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Related Work R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press,

1991. A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks.

Cambridge University Press, 2010. F. M. Bass. A new product growth for model consumer durables. Management Science,

15(5):215–227, 1969. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in

real networks. ACM TISSEC, 10(4), 2008. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly

Connected World. Cambridge University Press, 2010. A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of

epidemics. IEEE INFOCOM, 2005. Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free

networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003. H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer

Lecture Notes in Biomathematics, 46, 1984. J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer

viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE

Computer Society Symposium on Research in Security and Privacy, 1993. R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks.

Physical Review Letters 86, 14, 2001.

……… ……… ………

All are about either:

• Structured topologies (cliques, block-diagonals, hierarchies, random)

• Specific virus propagation models

• Static graphs

Background

Faloutsos, Prakash, Chau, Koutra, Akoglu 56Taiwan'12

Page 57: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)

– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses

• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 57Taiwan'12

Page 58: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

How should the answer look like?

• Answer should depend on:– Graph– Virus Propagation Model (VPM)

• But how??– Graph – average degree? max. degree? diameter?– VPM – which parameters? – How to combine – linear? quadratic? exponential?

?diameterdavg ?/)( max22 ddd avgavg …..

Faloutsos, Prakash, Chau, Koutra, Akoglu 58Taiwan'12

Page 59: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Static Graphs: Our Main Result

• Informally,

For, any arbitrary topology (adjacency matrix A) any virus propagation model (VPM) in standard literature

the epidemic threshold depends only 1. on the λ, first eigenvalue of A, and 2. some constant , determined by

the virus propagation model

λVPMC

No epidemic if λ *

< 1

VPMCVPMC

Faloutsos, Prakash, Chau, Koutra, Akoglu 59Taiwan'12In Prakash+ ICDM 2011 (Selected among best papers).

w/ DeepayChakrabarti

Page 60: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Our thresholds for some models

• s = effective strength• s < 1 : below threshold

Models Effective Strength (s) Threshold (tipping point)

SIS, SIR, SIRS, SEIRs = λ .

s = 1

SIV, SEIV s = λ .

(H.I.V.) s = λ .

12

221

vv

v

2121 VVISIFaloutsos, Prakash, Chau, Koutra, Akoglu 60Taiwan'12

Page 61: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Our result: Intuition for λ

“Official” definition:• Let A be the adjacency

matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)].

• Doesn’t give much intuition!

“Un-official” Intuition • λ ~ # paths in the

graph

uu≈ .

kkA

(i, j) = # of paths i j of length k

kA

Faloutsos, Prakash, Chau, Koutra, Akoglu 61Taiwan'12

Page 62: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Largest Eigenvalue (λ)

λ ≈ 2 λ = N λ = N-1

N = 1000λ ≈ 2 λ= 31.67 λ= 999

better connectivity higher λ

Faloutsos, Prakash, Chau, Koutra, Akoglu 62Taiwan'12 N nodes

Page 63: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Examples: Simulations – SIR (mumps)

(a) Infection profile (b) “Take-off” plot

PORTLAND graph: synthetic population, 31 million links, 6 million nodes

Frac

tion

of In

fecti

ons

Foot

prin

tEffective StrengthTime ticks

Faloutsos, Prakash, Chau, Koutra, Akoglu 63Taiwan'12

Page 64: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Examples: Simulations – SIRS (pertusis)

Frac

tion

of In

fecti

ons

Foot

prin

tEffective StrengthTime ticks

(a) Infection profile (b) “Take-off” plot

PORTLAND graph: synthetic population, 31 million links, 6 million nodesFaloutsos, Prakash, Chau, Koutra, Akoglu 64Taiwan'12

Page 65: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

λ * < 1

VPMC

Graph-based

Model-based

65

General VPM structure

Topology and stability

See paper for full proof

Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 66: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)

– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses

• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 66Taiwan'12

Page 67: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

λ * < 1VPMC

Graph-based

Model-basedGeneral VPM structure

Topology and stability

See paper for full proof

67Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 68: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)

– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses

• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 68Taiwan'12

Page 69: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Dynamic Graphs: Epidemic?

adjacency matrix

8

8

Alternating behaviorsDAY (e.g., work)

Faloutsos, Prakash, Chau, Koutra, Akoglu 69Taiwan'12

Page 70: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

adjacency matrix

8

8

Dynamic Graphs: Epidemic?Alternating behaviorsNIGHT

(e.g., home)

Faloutsos, Prakash, Chau, Koutra, Akoglu 70Taiwan'12

Page 71: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

• SIS model– recovery rate δ– infection rate β

• Set of T arbitrary graphs

Model Description

day

N

N night

N

N , weekend…..

Infected

Healthy

XN1

N3

N2

Prob. βProb. β Prob. δ

Faloutsos, Prakash, Chau, Koutra, Akoglu 71Taiwan'12

Page 72: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

• Informally, NO epidemic if

eig (S) = < 1

Our result: Dynamic Graphs Threshold

Single number! Largest eigenvalue of The system matrix S

In Prakash+, ECML-PKDD 2010

S =

Details

Faloutsos, Prakash, Chau, Koutra, Akoglu 72Taiwan'12

Page 73: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Synthetic MIT Reality Mining

log(fraction infected)

Time

BELOW

AT

ABOVE ABOVE

AT

BELOW

Infection-profile

Faloutsos, Prakash, Chau, Koutra, Akoglu 73Taiwan'12

Page 74: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

“Take-off” plotsFootprint (# infected @ “steady state”)

Our threshold

Our threshold

(log scale)

NO EPIDEMIC

EPIDEMIC

EPIDEMIC

NO EPIDEMIC

Synthetic MIT Reality

Faloutsos, Prakash, Chau, Koutra, Akoglu 74Taiwan'12

Page 75: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)

– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses

• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 75Taiwan'12

Page 76: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Competing Contagions

iPhone v Android Blu-ray v HD-DVD

76Taiwan'12 Faloutsos, Prakash, Chau, Koutra, AkogluBiological common flu/avian flu, pneumococcal inf etc

Page 77: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

A simple model

• Modified flu-like • Mutual Immunity (“pick one of the two”)• Susceptible-Infected1-Infected2-Susceptible

Virus 1 Virus 2

Details

Faloutsos, Prakash, Chau, Koutra, Akoglu 77Taiwan'12

Page 78: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Question: What happens in the end?

green: virus 1red: virus 2

Footprint @ Steady State Footprint @ Steady State= ?

Number of Infections

Faloutsos, Prakash, Chau, Koutra, Akoglu 78Taiwan'12

ASSUME: Virus 1 is stronger than Virus 2

Page 79: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Question: What happens in the end?

green: virus 1red: virus 2

Number of Infections

Strength Strength

??= Strength Strength

2

Footprint @ Steady State Footprint @ Steady State

79Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

ASSUME: Virus 1 is stronger than Virus 2

Page 80: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Answer: Winner-Takes-All

green: virus 1red: virus 2

Number of Infections

80Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

ASSUME: Virus 1 is stronger than Virus 2

Page 81: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Our Result: Winner-Takes-All

Given our model, and any graph, the weaker virus always dies-out completely

1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)3. Strength(Virus) = λ β / δ same as before!

Details

81Taiwan'12 Faloutsos, Prakash, Chau, Koutra, AkogluIn Prakash, Beutel, + WWW 2012

Page 82: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Real Examples

Reddit v Digg Blu-Ray v HD-DVD

[Google Search Trends data]

82Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 83: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)

Faloutsos, Prakash, Chau, Koutra, Akoglu 83Taiwan'12

Page 84: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

?

?

Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).

k = 2

??

Full Static Immunization

Faloutsos, Prakash, Chau, Koutra, Akoglu 84Taiwan'12

Page 85: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)

– Full Immunization (Static Graphs)– Fractional Immunization

Faloutsos, Prakash, Chau, Koutra, Akoglu 85Taiwan'12

Page 86: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Challenges

• Given a graph A, budget k, Q1 (Metric) How to measure the ‘shield-

value’ for a set of nodes (S)?

Q2 (Algorithm) How to find a set of k nodes with highest ‘shield-value’?

Faloutsos, Prakash, Chau, Koutra, Akoglu 86Taiwan'12

Page 87: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Proposed vulnerability measure λ

Increasing λ Increasing vulnerability

λ is the epidemic threshold

“Safe” “Vulnerable” “Deadly”

Faloutsos, Prakash, Chau, Koutra, Akoglu 87Taiwan'12

Page 88: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

1

9

10

3

4

5

7

8

6

2

9

1

11

10

3

4

56

7

8

2

9

Original Graph Without {2, 6}

Eigen-Drop(S) Δ λ = λ - λs

Δ

A1: “Eigen-Drop”: an ideal shield value

Faloutsos, Prakash, Chau, Koutra, Akoglu 88Taiwan'12

Page 89: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

(Q2) - Direct Algorithm too expensive!

• Immunize k nodes which maximize Δ λ

S = argmax Δ λ• Combinatorial!• Complexity:

– Example: • 1,000 nodes, with 10,000 edges • It takes 0.01 seconds to compute λ• It takes 2,615 years to find 5-best nodes!

Faloutsos, Prakash, Chau, Koutra, Akoglu 89Taiwan'12

Page 90: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

A2: Our Solution

• Part 1: Shield Value– Carefully approximate Eigen-drop (Δ λ)– Matrix perturbation theory

• Part 2: Algorithm– Greedily pick best node at each step– Near-optimal due to submodularity

• NetShield (linear complexity)– O(nk2+m) n = # nodes; m = # edges

Faloutsos, Prakash, Chau, Koutra, Akoglu 90Taiwan'12In Tong, Prakash+ ICDM 2010

Page 91: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Experiment: Immunization quality

Log(fraction of infected nodes)

NetShield

Degree

PageRank

Eigs (=HITS)Acquaintance

Betweeness (shortest path)

Lower is

better TimeFaloutsos, Prakash, Chau, Koutra, Akoglu 91Taiwan'12

Page 92: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)

– Full Immunization (Static Graphs)– Fractional Immunization

Faloutsos, Prakash, Chau, Koutra, Akoglu 92Taiwan'12

Page 93: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos

Under review

Faloutsos, Prakash, Chau, Koutra, Akoglu 93Taiwan'12

Page 94: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

Faloutsos, Prakash, Chau, Koutra, Akoglu 94Taiwan'12

Page 95: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

Faloutsos, Prakash, Chau, Koutra, Akoglu 95Taiwan'12

Page 96: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Fractional Asymmetric Immunization

Hospital Another Hospital

Problem: Given k units of disinfectant, how to distribute them to maximize

hospitals saved?

Faloutsos, Prakash, Chau, Koutra, Akoglu 96Taiwan'12

Page 97: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Our Algorithm “SMART-ALLOC”

CURRENT PRACTICE SMART-ALLOC

[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred

~6x fewer!

Faloutsos, Prakash, Chau, Koutra, Akoglu 97Taiwan'12

Page 98: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Running Time

Simulations SMART-ALLOC

> 1 week

14 secs

> 30,000x speed-up!

Wall-Clock Time

Lower is better

Faloutsos, Prakash, Chau, Koutra, Akoglu 98Taiwan'12

Page 99: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Experiments

K = 200 K = 2000

PENN-NETWORK SECOND-LIFE

~5 x ~2.5 x

Lower is better

Faloutsos, Prakash, Chau, Koutra, Akoglu 99Taiwan'12

Page 100: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Acknowledgements

Funding

Faloutsos, Prakash, Chau, Koutra, Akoglu 100Taiwan'12

Page 101: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

References1. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B.

Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.)

2. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain

3. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain

4. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon

5. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia

6. Fractional Immunization of Networks (B. Aditya Prakash, Lada Adamic, Theodore Iwashyna, Hanghang Tong, Christos Faloutsos) - Under Submission

7. Rise and Fall Patterns of Information Diffusion: Model and Implications (Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos) - Under Submission

101

http://www.cs.cmu.edu/~badityap/Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu

Page 102: Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

Analysis Policy/Action Data

Propagation on Large Networks

B. Aditya Prakash Christos Faloutsos

102Taiwan'12 Faloutsos, Prakash, Chau, Koutra, Akoglu