67
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

  • View
    231

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

CMU SCS

Multimedia and Graph mining

Christos Faloutsos

CMU

Page 2: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 2

CMU SCS

CONGRATULATIONS!

Page 3: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 3

CMU SCS

Outline

• Problem definition / Motivation

• Biological image mining

• Graphs and power laws

• Streams and forecasting

• Conclusions

Page 4: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 4

CMU SCS

Motivation

• Data mining: ~ find patterns (rules, outliers)

• How do detached cat retinas evolve?

• How do real graphs look like?

• How do (numerical) streams look like?

Page 5: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 5

CMU SCS

ViVo: cat retina mining

• with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB)

• Jia-Yu Tim Pan, HJ Yang (CMU)

Page 6: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 6

CMU SCS

Detachment Development

Normal1 day after detachment

3 days after detachment

7 days after detachment

28 days after detachment

3 months after detachment

Page 7: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 7

CMU SCS

Data and Problem

• (Problem) What happens in retina after detachment?– What tissues (regions) are involved? – How do they change over time?

• How will a program convey this info?• More than classification

“we want to learn what classifier learned”

Page 8: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 8

CMU SCS

Main idea

• extract characteristic visual ‘words’

• Equivalent to characteristic keywords, in a collection of text documents

Page 9: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 9

CMU SCS

Visual vocabulary?

Page 10: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 10

CMU SCS

Visual vocabulary?news:president,minister,economic

sports:baseball,score,penalty

Page 11: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 11

CMU SCS

Visual Vocabulary (ViVo) generation

Step 1: Tile image

Step 2: Extract tile features

Step 3: ViVo

generation

Visualvocabulary

V1

V2

Feature 1

Fea

ture

2

8x12 tiles

Page 12: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 12

CMU SCS

Biological interpretationID ViVo Description Condition

V1 GFAP in inner retina (Müller cells) Healthy

V10 Healthy outer segments of rod photoreceptors

Healthy

V8 Redistribution of rod opsin into cell bodies of rod photoreceptors

Detached

V11 Co-occurring processes: Müller cell hypertrophy and rod opsin redistribution

Detached

Page 13: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 13

CMU SCS

Which tissue is significant on 7-day?

Page 14: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 14

CMU SCS

FEMine: Mining Fly Embryos

Page 15: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 15

CMU SCS

With• Eric Xing (CMU CS)

• Bob Murphy (CMU – Bio)

• Tim Pan (CMU -> Google)

• Andre Balan (U. Sao Paulo)

Page 16: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 16

CMU SCS

Outline

• Problem definition / Motivation

• Biological image mining

• Graphs and power laws

• Streams and forecasting

• Conclusions

Page 17: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 17

CMU SCS

Graphs - why should we care?

Page 18: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 18

CMU SCS

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

Page 19: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 19

CMU SCS

Joint work with

• Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

Page 20: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 20

CMU SCS

Problem: network and graph mining

• How does the Internet look like?• How does the web look like?• What constitutes a ‘normal’ social

network?• What is ‘normal’/‘abnormal’?• which patterns/laws hold?

Page 21: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 21

CMU SCS

Graph mining

• Are real graphs random?

Page 22: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 22

CMU SCS

Laws and patterns

NO!!

• Diameter

• in- and out- degree distributions

• other (surprising) patterns

Page 23: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 23

CMU SCS

Laws – degree distributions

• Q: avg degree is ~3 - what is the most probable degree?

degree

count ??

3

Page 24: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 24

CMU SCS

Laws – degree distributions

• Q: avg degree is ~3 - what is the most probable degree?

degreedegree

count ??

3

count

3

Page 25: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 25

CMU SCS

Solution:

The plot is linear in log-log scale [FFF’99]

freq = degree (-2.15)

O = -2.15

Exponent = slope

Outdegree

Frequency

Nov’97

-2.15

Page 26: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 26

CMU SCS

But:

• Q1: How about graphs from other domains?

• Q2: How about temporal evolution?

Page 27: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 27

CMU SCS

The Peer-to-Peer Topology

• Frequency versus degree • Number of adjacent peers follows a power-law

[Jovanovic+]

Page 28: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 28

CMU SCS

More power laws:

citation counts: (citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

Ullman

Page 29: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 29

CMU SCS Swedish sex-web

Nodes: people (Females; Males)Links: sexual relationships

Liljeros et al. Nature 2001

4781 Swedes; 18-74; 59% response rate.

Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt

Page 30: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 30

CMU SCS

More power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(in-degree)

log(count)

Zipf

userssites

``ebay’’

Page 31: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 31

CMU SCS

epinions.com

• who-trusts-whom [Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

Page 32: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 32

CMU SCS

Page 33: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 33

CMU SCS

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

• similarly, in many other languages; for customers and sales volume; city populations etc etclog(rank)

log(freq)

Page 34: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 34

CMU SCS

Olympic medals (Sidney’00, Athens’04):

log( rank)

log(#medals)

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

athens

sidney

Page 35: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 35

CMU SCS

More power laws: areas – Korcak’s law

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(count( >= area))

log(area)

Page 36: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 36

CMU SCS

Page 37: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 37

CMU SCS

But:

• Q1: How about graphs from other domains?

• Q2: How about temporal evolution?

Page 38: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 38

CMU SCS

Time evolution

• with Jure Leskovec (CMU)

• and Jon Kleinberg (Cornell)

(‘best paper’ KDD05)

Page 39: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 39

CMU SCS

Evolution of the Diameter

• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)

• What is happening in real data?

Page 40: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 40

CMU SCS

Evolution of the Diameter

• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)

• What is happening in real data?

• Diameter shrinks over time– As the network grows the distances between

nodes slowly decrease

Page 41: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 41

CMU SCS

Diameter – ArXiv citation graph

• Citations among physics papers

• 1992 –2003

• One graph per year

time [years]

diameter

Page 42: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 42

CMU SCS

Diameter – “Autonomous Systems”

• Graph of Internet

• One graph per day

• 1997 – 2000

number of nodes

diameter

Page 43: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 43

CMU SCS

Diameter – “Affiliation Network”

• Graph of collaborations in physics – authors linked to papers

• 10 years of data

time [years]

diameter

Page 44: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 44

CMU SCS

Diameter – “Patents”

• Patent citation network

• 25 years of data

time [years]

diameter

Page 45: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 45

CMU SCS

Temporal Evolution of the Graphs

• N(t) … nodes at time t

• E(t) … edges at time t

• Suppose thatN(t+1) = 2 * N(t)

• Q: what is your guess for E(t+1) =? 2 * E(t)

Page 46: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 46

CMU SCS

Temporal Evolution of the Graphs

• N(t) … nodes at time t• E(t) … edges at time t• Suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for E(t+1) =? 2 * E(t)

• A: over-doubled!– But obeying the ``Densification Power Law’’

Page 47: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 47

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

1.69

Page 48: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 48

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

1.69

1: tree

Page 49: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 49

CMU SCS

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

1.69clique: 2

Page 50: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 50

CMU SCS

Densification – Patent Citations• Citations among

patents granted

• 1999– 2.9 million nodes– 16.5 million

edges

• Each year is a datapoint N(t)

E(t)

1.66

Page 51: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 51

CMU SCS

Densification – Autonomous Systems

• Graph of Internet

• 2000– 6,000 nodes– 26,000 edges

• One graph per day

N(t)

E(t)

1.18

Page 52: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 52

CMU SCS

Densification – Affiliation Network

• Authors linked to their publications

• 2002– 60,000 nodes

• 20,000 authors

• 38,000 papers

– 133,000 edgesN(t)

E(t)

1.15

Page 53: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 53

CMU SCS

Graphs - Conclusions

• Real graphs obey some surprising patterns– which can help us spot anomalies / outliers

• A lot of interest from web searching companies– recommendation systems

– link spamming

– trust propagation

• HUGE graphs (Millions and Billions of nodes)

Page 54: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 54

CMU SCS

Outline

• Problem definition / Motivation

• Biological image mining

• Graphs and power laws

• Streams and forecasting

• Conclusions

Page 55: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 55

CMU SCS

Why care about streams?

Page 56: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 56

CMU SCS

Why care about streams?• Sensor devices

– Temperature, weather measurements– Road traffic data– Geological observations– Patient physiological data– sensor-Andrew project

• Embedded devices– Network routers

Page 57: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 57

CMU SCS

Co-evolving time sequences

Joint work with

• Jimeng Sun (CMU)

• Dr. Spiros Papadimitriou (CMU/IBM)

• Dr. Yasushi Sakurai (NTT)

• Prof. Jeanne VanBriesen (CMU/CEE)

Page 58: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 58

CMU SCS

Motivation

water distribution network

normal operation

Phase 1 Phase 2 Phase 3

: : : : : :

: : : : : :

chlo

rine c

once

ntr

ati

ons

sensorsnear leak

sensorsawayfrom leak

Page 59: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 59

CMU SCS

Phase 1 Phase 2 Phase 3

: : : : : :

: : : : : :

Motivation

water distribution network

normal operation major leak

chlo

rine c

once

ntr

ati

ons

sensorsnear leak

sensorsawayfrom leak

Page 60: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 60

CMU SCS

Motivation

actual measurements(n streams)

k hidden variable(s)

spot: “hidden (latent) variables”

Phase 1

: : : : : :

: : : : : :

chlo

rine c

once

ntr

ati

ons

Phase 1

k = 1

Page 61: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 61

CMU SCS

Motivation

chlo

rine c

once

ntr

ati

ons

Phase 1 Phase 1Phase 2 Phase 2

actual measurements(n streams)

k hidden variable(s)

k = 2

: : : : : :

: : : : : :

spot: “hidden (latent) variables”

Page 62: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 62

CMU SCS

Motivation

chlo

rine c

once

ntr

ati

ons

Phase 1 Phase 1Phase 2 Phase 2Phase 3 Phase 3

actual measurements(n streams)

k hidden variable(s)

k = 1

: : : : : :

: : : : : :

spot: “hidden (latent) variables”

Page 63: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 63

CMU SCS

SPIRIT / InteMon

• http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp• http://localhost:8080/demo/graphs.jsp• self-* storage system (PDL/CMU)

– 1 PetaByte storage– self-monitoring, self-healing: self-*

• with Jimeng Sun (CMU/CS)• Evan Hoke (CMU/CS-ug)• Prof. Greg Ganger (CMU/CS/ECE)• John Strunk (CMU/ECE)

Page 64: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 64

CMU SCS

Related project• Anomaly detection in network

traffic (Zhang, Xie)

Page 65: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 65

CMU SCS

Conclusions

• Biological images, graphs & streams pose fascinating problems

• self-similarity, fractals and power laws work, when other methods fail!

Page 66: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 66

CMU SCS

Books

• Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

Page 67: CMU SCS Multimedia and Graph mining Christos Faloutsos CMU

IC '06 C. Faloutsos 67

CMU SCS

Contact info

[email protected]• www.cs.cmu.edu/~christos• Wean Hall 7107• Ph#: x8.1457

• and, again WELCOME!