View
231
Download
0
Embed Size (px)
Citation preview
CMU SCS
Multimedia and Graph mining
Christos Faloutsos
CMU
IC '06 C. Faloutsos 2
CMU SCS
CONGRATULATIONS!
IC '06 C. Faloutsos 3
CMU SCS
Outline
• Problem definition / Motivation
• Biological image mining
• Graphs and power laws
• Streams and forecasting
• Conclusions
IC '06 C. Faloutsos 4
CMU SCS
Motivation
• Data mining: ~ find patterns (rules, outliers)
• How do detached cat retinas evolve?
• How do real graphs look like?
• How do (numerical) streams look like?
IC '06 C. Faloutsos 5
CMU SCS
ViVo: cat retina mining
• with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB)
• Jia-Yu Tim Pan, HJ Yang (CMU)
IC '06 C. Faloutsos 6
CMU SCS
Detachment Development
Normal1 day after detachment
3 days after detachment
7 days after detachment
28 days after detachment
3 months after detachment
IC '06 C. Faloutsos 7
CMU SCS
Data and Problem
• (Problem) What happens in retina after detachment?– What tissues (regions) are involved? – How do they change over time?
• How will a program convey this info?• More than classification
“we want to learn what classifier learned”
IC '06 C. Faloutsos 8
CMU SCS
Main idea
• extract characteristic visual ‘words’
• Equivalent to characteristic keywords, in a collection of text documents
IC '06 C. Faloutsos 9
CMU SCS
Visual vocabulary?
IC '06 C. Faloutsos 10
CMU SCS
Visual vocabulary?news:president,minister,economic
sports:baseball,score,penalty
IC '06 C. Faloutsos 11
CMU SCS
Visual Vocabulary (ViVo) generation
Step 1: Tile image
Step 2: Extract tile features
Step 3: ViVo
generation
Visualvocabulary
V1
V2
Feature 1
Fea
ture
2
8x12 tiles
IC '06 C. Faloutsos 12
CMU SCS
Biological interpretationID ViVo Description Condition
V1 GFAP in inner retina (Müller cells) Healthy
V10 Healthy outer segments of rod photoreceptors
Healthy
V8 Redistribution of rod opsin into cell bodies of rod photoreceptors
Detached
V11 Co-occurring processes: Müller cell hypertrophy and rod opsin redistribution
Detached
IC '06 C. Faloutsos 13
CMU SCS
Which tissue is significant on 7-day?
IC '06 C. Faloutsos 14
CMU SCS
FEMine: Mining Fly Embryos
IC '06 C. Faloutsos 15
CMU SCS
With• Eric Xing (CMU CS)
• Bob Murphy (CMU – Bio)
• Tim Pan (CMU -> Google)
• Andre Balan (U. Sao Paulo)
IC '06 C. Faloutsos 16
CMU SCS
Outline
• Problem definition / Motivation
• Biological image mining
• Graphs and power laws
• Streams and forecasting
• Conclusions
IC '06 C. Faloutsos 17
CMU SCS
Graphs - why should we care?
IC '06 C. Faloutsos 18
CMU SCS
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Protein Interactions [genomebiology.com]
Friendship Network [Moody ’01]
IC '06 C. Faloutsos 19
CMU SCS
Joint work with
• Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)
IC '06 C. Faloutsos 20
CMU SCS
Problem: network and graph mining
• How does the Internet look like?• How does the web look like?• What constitutes a ‘normal’ social
network?• What is ‘normal’/‘abnormal’?• which patterns/laws hold?
IC '06 C. Faloutsos 21
CMU SCS
Graph mining
• Are real graphs random?
IC '06 C. Faloutsos 22
CMU SCS
Laws and patterns
NO!!
• Diameter
• in- and out- degree distributions
• other (surprising) patterns
IC '06 C. Faloutsos 23
CMU SCS
Laws – degree distributions
• Q: avg degree is ~3 - what is the most probable degree?
degree
count ??
3
IC '06 C. Faloutsos 24
CMU SCS
Laws – degree distributions
• Q: avg degree is ~3 - what is the most probable degree?
degreedegree
count ??
3
count
3
IC '06 C. Faloutsos 25
CMU SCS
Solution:
The plot is linear in log-log scale [FFF’99]
freq = degree (-2.15)
O = -2.15
Exponent = slope
Outdegree
Frequency
Nov’97
-2.15
IC '06 C. Faloutsos 26
CMU SCS
But:
• Q1: How about graphs from other domains?
• Q2: How about temporal evolution?
IC '06 C. Faloutsos 27
CMU SCS
The Peer-to-Peer Topology
• Frequency versus degree • Number of adjacent peers follows a power-law
[Jovanovic+]
IC '06 C. Faloutsos 28
CMU SCS
More power laws:
citation counts: (citeseer.nj.nec.com 6/2001)
log(#citations)
log(count)
Ullman
IC '06 C. Faloutsos 29
CMU SCS Swedish sex-web
Nodes: people (Females; Males)Links: sexual relationships
Liljeros et al. Nature 2001
4781 Swedes; 18-74; 59% response rate.
Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt
IC '06 C. Faloutsos 30
CMU SCS
More power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(in-degree)
log(count)
Zipf
userssites
``ebay’’
IC '06 C. Faloutsos 31
CMU SCS
epinions.com
• who-trusts-whom [Richardson + Domingos, KDD 2001]
(out) degree
count
trusts-2000-people user
IC '06 C. Faloutsos 32
CMU SCS
IC '06 C. Faloutsos 33
CMU SCS
A famous power law: Zipf’s law
• Bible - rank vs frequency (log-log)
• similarly, in many other languages; for customers and sales volume; city populations etc etclog(rank)
log(freq)
IC '06 C. Faloutsos 34
CMU SCS
Olympic medals (Sidney’00, Athens’04):
log( rank)
log(#medals)
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
athens
sidney
IC '06 C. Faloutsos 35
CMU SCS
More power laws: areas – Korcak’s law
Scandinavian lakes area vs complementary cumulative count (log-log axes)
log(count( >= area))
log(area)
IC '06 C. Faloutsos 36
CMU SCS
IC '06 C. Faloutsos 37
CMU SCS
But:
• Q1: How about graphs from other domains?
• Q2: How about temporal evolution?
IC '06 C. Faloutsos 38
CMU SCS
Time evolution
• with Jure Leskovec (CMU)
• and Jon Kleinberg (Cornell)
(‘best paper’ KDD05)
IC '06 C. Faloutsos 39
CMU SCS
Evolution of the Diameter
• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)
• What is happening in real data?
IC '06 C. Faloutsos 40
CMU SCS
Evolution of the Diameter
• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time– As the network grows the distances between
nodes slowly decrease
IC '06 C. Faloutsos 41
CMU SCS
Diameter – ArXiv citation graph
• Citations among physics papers
• 1992 –2003
• One graph per year
time [years]
diameter
IC '06 C. Faloutsos 42
CMU SCS
Diameter – “Autonomous Systems”
• Graph of Internet
• One graph per day
• 1997 – 2000
number of nodes
diameter
IC '06 C. Faloutsos 43
CMU SCS
Diameter – “Affiliation Network”
• Graph of collaborations in physics – authors linked to papers
• 10 years of data
time [years]
diameter
IC '06 C. Faloutsos 44
CMU SCS
Diameter – “Patents”
• Patent citation network
• 25 years of data
time [years]
diameter
IC '06 C. Faloutsos 45
CMU SCS
Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose thatN(t+1) = 2 * N(t)
• Q: what is your guess for E(t+1) =? 2 * E(t)
IC '06 C. Faloutsos 46
CMU SCS
Temporal Evolution of the Graphs
• N(t) … nodes at time t• E(t) … edges at time t• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for E(t+1) =? 2 * E(t)
• A: over-doubled!– But obeying the ``Densification Power Law’’
IC '06 C. Faloutsos 47
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:– 29,555 papers,
352,807 citations
N(t)
E(t)
1.69
IC '06 C. Faloutsos 48
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:– 29,555 papers,
352,807 citations
N(t)
E(t)
1.69
1: tree
IC '06 C. Faloutsos 49
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:– 29,555 papers,
352,807 citations
N(t)
E(t)
1.69clique: 2
IC '06 C. Faloutsos 50
CMU SCS
Densification – Patent Citations• Citations among
patents granted
• 1999– 2.9 million nodes– 16.5 million
edges
• Each year is a datapoint N(t)
E(t)
1.66
IC '06 C. Faloutsos 51
CMU SCS
Densification – Autonomous Systems
• Graph of Internet
• 2000– 6,000 nodes– 26,000 edges
• One graph per day
N(t)
E(t)
1.18
IC '06 C. Faloutsos 52
CMU SCS
Densification – Affiliation Network
• Authors linked to their publications
• 2002– 60,000 nodes
• 20,000 authors
• 38,000 papers
– 133,000 edgesN(t)
E(t)
1.15
IC '06 C. Faloutsos 53
CMU SCS
Graphs - Conclusions
• Real graphs obey some surprising patterns– which can help us spot anomalies / outliers
• A lot of interest from web searching companies– recommendation systems
– link spamming
– trust propagation
• HUGE graphs (Millions and Billions of nodes)
IC '06 C. Faloutsos 54
CMU SCS
Outline
• Problem definition / Motivation
• Biological image mining
• Graphs and power laws
• Streams and forecasting
• Conclusions
IC '06 C. Faloutsos 55
CMU SCS
Why care about streams?
IC '06 C. Faloutsos 56
CMU SCS
Why care about streams?• Sensor devices
– Temperature, weather measurements– Road traffic data– Geological observations– Patient physiological data– sensor-Andrew project
• Embedded devices– Network routers
IC '06 C. Faloutsos 57
CMU SCS
Co-evolving time sequences
Joint work with
• Jimeng Sun (CMU)
• Dr. Spiros Papadimitriou (CMU/IBM)
• Dr. Yasushi Sakurai (NTT)
• Prof. Jeanne VanBriesen (CMU/CEE)
IC '06 C. Faloutsos 58
CMU SCS
Motivation
water distribution network
normal operation
Phase 1 Phase 2 Phase 3
: : : : : :
: : : : : :
chlo
rine c
once
ntr
ati
ons
sensorsnear leak
sensorsawayfrom leak
IC '06 C. Faloutsos 59
CMU SCS
Phase 1 Phase 2 Phase 3
: : : : : :
: : : : : :
Motivation
water distribution network
normal operation major leak
chlo
rine c
once
ntr
ati
ons
sensorsnear leak
sensorsawayfrom leak
IC '06 C. Faloutsos 60
CMU SCS
Motivation
actual measurements(n streams)
k hidden variable(s)
spot: “hidden (latent) variables”
Phase 1
: : : : : :
: : : : : :
chlo
rine c
once
ntr
ati
ons
Phase 1
k = 1
IC '06 C. Faloutsos 61
CMU SCS
Motivation
chlo
rine c
once
ntr
ati
ons
Phase 1 Phase 1Phase 2 Phase 2
actual measurements(n streams)
k hidden variable(s)
k = 2
: : : : : :
: : : : : :
spot: “hidden (latent) variables”
IC '06 C. Faloutsos 62
CMU SCS
Motivation
chlo
rine c
once
ntr
ati
ons
Phase 1 Phase 1Phase 2 Phase 2Phase 3 Phase 3
actual measurements(n streams)
k hidden variable(s)
k = 1
: : : : : :
: : : : : :
spot: “hidden (latent) variables”
IC '06 C. Faloutsos 63
CMU SCS
SPIRIT / InteMon
• http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp• http://localhost:8080/demo/graphs.jsp• self-* storage system (PDL/CMU)
– 1 PetaByte storage– self-monitoring, self-healing: self-*
• with Jimeng Sun (CMU/CS)• Evan Hoke (CMU/CS-ug)• Prof. Greg Ganger (CMU/CS/ECE)• John Strunk (CMU/ECE)
IC '06 C. Faloutsos 64
CMU SCS
Related project• Anomaly detection in network
traffic (Zhang, Xie)
IC '06 C. Faloutsos 65
CMU SCS
Conclusions
• Biological images, graphs & streams pose fascinating problems
• self-similarity, fractals and power laws work, when other methods fail!
IC '06 C. Faloutsos 66
CMU SCS
Books
• Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)
IC '06 C. Faloutsos 67
CMU SCS
Contact info
• [email protected]• www.cs.cmu.edu/~christos• Wean Hall 7107• Ph#: x8.1457
• and, again WELCOME!