36
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Mining Billion-node Graphs

Christos Faloutsos

CMU

Page 2: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Related Tasks for this presentation• E2.1• E3.1• I3.1

INARC CUNY'10 C. Faloutsos (CMU) 3

Page 3: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Big picture: large graph mining• Patterns / anomalies

– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type

nodes/edges)

• Generators– Kronecker; Random Typing; Tensors

• Influence/Virus PropagationINARC CUNY'10 4C. Faloutsos (CMU)

Page 4: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Big picture: large graph mining• Patterns / anomalies

– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type

nodes/edges)

• Generators– Kronecker; Random Typing; Tensors

• Influence/Virus Propagation

Duration of phonecalls (E2.1)

(I3.1)INARC CUNY'10 5C. Faloutsos (CMU)

Page 5: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Virus Propagation on Time-Varying Networks: Theory and

Immunization Algorithms

ECML-PKDD 2010, Barcelona, Spain

B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis

Faloutsos+, Christos Faloutsos**Carnegie Mellon University, Pittsburgh USA

+University of California – Riverside USA^ IBM Research, Hawthrone USA

PKDD, 2010 demo

INARC and IRC

Page 6: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Q1: threshold?Strong Virus

INARC CUNY'10 7C. Faloutsos (CMU)

Page 7: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Q1: threshold?

Epidemic!

Strong Virus

INARC CUNY'10 8C. Faloutsos (CMU)

Page 8: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Q1: threshold?Weak Virus

INARC CUNY'10 9C. Faloutsos (CMU)

Page 9: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Q1: threshold?Weak Virus

Small infection

INARC CUNY'10 10C. Faloutsos (CMU)

Page 10: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Q2: Immunization?

Which nodes to immunize?

?

?

INARC CUNY'10 11C. Faloutsos (CMU)

Page 11: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Standard, static graph setting:• Simple stochastic framework (Flu-like – SIS’)• FIXED underlying contact-network – ‘who-

can-infect-whom’

• OUR CASE:–Changes in time – alternating behaviors!–E.g., day vs night

Our Framework

INARC CUNY'10 12C. Faloutsos (CMU)

Page 12: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• ‘S’ Susceptible (= healthy); ‘I’ Infected• No immunity (cured nodes -> ‘S’)

Reminder: ‘Flu-like’ (SIS)

Susceptible Infected

Infected by neighbor

Cured internally

INARC CUNY'10 13C. Faloutsos (CMU)

Page 13: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• Virus birth rate β• Host cure rate δ

SIS model (continued)

Infected

Healthy

XN1

N3

N2Prob. β

Prob. β

Prob. δ

INARC CUNY'10 14C. Faloutsos (CMU)

Page 14: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Alternating Behaviors

adjacency matrix

8

8

INARC CUNY'10 15C. Faloutsos (CMU)

DAY

(e.g., work)

Page 15: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Alternating Behaviors

NIGHT

(e.g., home)

adjacency matrix

8

8

INARC CUNY'10 16C. Faloutsos (CMU)

Page 16: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

√Our Framework √SIS epidemic model

√Time varying graphs• Problem Descriptions• Epidemic Threshold• Immunization• Conclusion

Outline

INARC CUNY'10 17C. Faloutsos (CMU)

Page 17: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• SIS model– cure rate δ– infection rate β

• Set of T arbitrary graphs

Formally, given

day

N

N night

N

N ….weekend…..

Infected

Healthy

XN1

N3

N2

Prob. βProb. β

Prob. δ

INARC CUNY'10 18C. Faloutsos (CMU)

Page 18: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Find…

Q1: Epidemic Threshold:

Fast die-out?

Q2: Immunization

best k??

?

above

below

I

t

INARC CUNY'10 19C. Faloutsos (CMU)

Page 19: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• NO epidemic if

eig (S) = lS < 1

Q1: Threshold - Main result

INARC CUNY'10 20C. Faloutsos (CMU)

Page 20: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• NO epidemic if

eig (S) = lS < 1

Q1: Threshold - Main result

Single number!

Largest eigenvalue of the “system matrix ”

INARC CUNY'10 21C. Faloutsos (CMU)

Page 21: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

NO epidemic if eig (S) < 1

S = Pi Si

cure rate

infection rate

……..

adjacency matrix

N

N

day night

Details

INARC CUNY'10 22C. Faloutsos (CMU)

Page 22: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• Synthetic– 100 nodes– Clique; Chain

• MIT Reality Mining– 104 mobile devices– September 2004 – June 2005– 12-hr adjacency matrices

Q1: Simulation experiments

INARC CUNY'10 23C. Faloutsos (CMU)

Page 23: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

‘Take-off’ plots

Synthetic MIT Reality Mining

Footprint (# infected @ steady state)

Our threshold Our

threshold

(log scale)

NO EPIDEMIC

EPIDEMICEPIDEMIC

NO EPIDEMIC

INARC CUNY'10 24C. Faloutsos (CMU)

Page 24: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Time-plots

Synthetic MIT Reality Mininglog(# infected)

Time

< threshold

@ threshold >threshold

@ threshold

INARC CUNY'10 25C. Faloutsos (CMU)

>threshold

< threshold

Page 25: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

√Motivation

√Our Framework √SIS epidemic model

√Time varying graphs

√Problem Descriptions

√Epidemic Threshold• Immunization

Outline

INARC CUNY'10 26C. Faloutsos (CMU)

Page 26: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• Our solution–reduce lPi Si ( == l )–goal: max ‘eigendrop’ Δl

• Comparison - But : No competing policy• We propose and evaluate many policies

Q2: Immunization

Δl = l_before - l _after

?

INARC CUNY'10 27C. Faloutsos (CMU)

?

Page 27: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Lower is better

OptimalGreedy-S

Greedy-DavgA

INARC CUNY'10 28C. Faloutsos (CMU)

Page 28: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

• Time-varying Graphs• SIS (flu-like) propagation model

√ Q1: Epidemic Threshold - < 1l– Only first eigen-value of system matrix!

√ Q2: Immunization Policies – max. Δl – Optimal– Greedy-S– Greedy-DavgA– etc.

Conclusion

INARC CUNY'10 29C. Faloutsos (CMU)

Page 29: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Goal: large graph mining• Patterns / anomalies

– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type

nodes/edges)

• Generators– Kronecker; Random Typing; Tensors

• Influence/Virus Propagation

Duration of phonecalls (E2.1)

(I3.1)INARC CUNY'10 30C. Faloutsos (CMU)

Page 30: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Duration of phonecalls

Surprising Patterns for the Call Duration Distribution of Mobile Phone Users

Pedro O. S. Vaz de Melo, Leman

Akoglu, Christos Faloutsos, Antonio A. F. Loureiro

PKDD 2010

INARC CUNY'10 31C. Faloutsos (CMU)

Page 31: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Probably, power law (?)

??

INARC CUNY'10 32C. Faloutsos (CMU)

Page 32: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

No Power Law!

INARC CUNY'10 33C. Faloutsos (CMU)

Page 33: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

‘TLaC: Lazy Contractor’• The longer a task (phonecall) has taken,• The even longer it will take

Odds ratio=

Casualties(<x):Survivors(>=x)

== power law

INARC CUNY'10 34C. Faloutsos (CMU)

Page 34: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Data Description

Data from a private mobile operator of a large city 4 months of data 3.1 million users more than 1 billion phone records

Among users with >30 phonecalls 96% followed TLAC Rest: anomalies (too many 1h phonecalls)

INARC CUNY'10 35C. Faloutsos (CMU)

Page 35: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Goal: large graph mining• Patterns / anomalies

– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type

nodes/edges)

• Generators– Kronecker; Random Typing; Tensors

• Influence/Virus Propagation

Duration of phonecalls (E2.1)

(I3.1)INARC CUNY'10 36C. Faloutsos (CMU)

Page 36: CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU

CMU SCS

Project infowww.cs.cmu.edu/~pegasus

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tsourakakis, Babis

Tong, Hanghang

Prakash,Aditya

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP

INARC CUNY'10 37C. Faloutsos (CMU)