Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
MSU, 2005 C. Faloutsos 1
School of Computer Science
Carnegie Mellon
Data Mining using Fractals and
Power laws
Christos Faloutsos
Carnegie Mellon University
MSU, 2005 C. Faloutsos 2
School of Computer Science
Carnegie Mellon
THANK YOU!
• Prof. Rong Jin
• Prof. Pang-Ning Tan
MSU, 2005 C. Faloutsos 3
School of Computer Science
Carnegie Mellon
Overview
• Goals/ motivation: find patterns in large
datasets:
– (A) Sensor data
– (B) network/graph data
• Solutions: self-similarity and power laws
• Discussion
MSU, 2005 C. Faloutsos 4
School of Computer Science
Carnegie Mellon
Applications of sensors/streams
• ‘Smart house’: monitoring temperature,
humidity etc
• Financial, sales, economic series
10/21/2005 23:09
MSU, 2005 C. Faloutsos 5
School of Computer Science
Carnegie Mellon
Motivation - Applications
• Medical: ECGs +; blood
pressure etc monitoring
• Scientific data: seismological;
astronomical; environment /
anti-pollution; meteorologicalSunspot Data
0
50
100
150
200
250
300
MSU, 2005 C. Faloutsos 6
School of Computer Science
Carnegie Mellon
Motivation - Applications
(cont’d)
• civil/automobile infrastructure
– bridge vibrations [Oppenheim+02]
– road conditions / traffic monitoring
Automobile traffic
0200400600800
100012001400160018002000
time
# cars
MSU, 2005 C. Faloutsos 7
School of Computer Science
Carnegie Mellon
Motivation - Applications
(cont’d)
• Computer systems
– web servers (buffering, prefetching)
– network traffic monitoring
– ...
http://repository.cs.vt.edu/lbl-conn-7.tar.Z
MSU, 2005 C. Faloutsos 8
School of Computer Science
Carnegie Mellon
Web traffic
• [Crovella Bestavros, SIGMETRICS’96]
10/21/2005 23:09
MSU, 2005 C. Faloutsos 9
School of Computer Science
Carnegie Mellon
...
survivable,
self-managing storage
infrastructure
...
a storage brick
(0.5–5 TB)~1 PB
� “self-*” = self-managing, self-tuning, self-healing, …
� Goal: 1 petabyte (PB) for CMU researchers
� www.pdl.cmu.edu/SelfStar
Self-* Storage (Ganger+)
MSU, 2005 C. Faloutsos 10
School of Computer Science
Carnegie Mellon
Problem definition
• Given: one or more sequences
x1 , x2 , … , xt , …; (y1, y2, … , yt, …)
• Find
– patterns; clusters; outliers; forecasts;
MSU, 2005 C. Faloutsos 11
School of Computer Science
Carnegie Mellon
Problem #1
• Find patterns, in large
datasets
time
# bytes
MSU, 2005 C. Faloutsos 12
School of Computer Science
Carnegie Mellon
Problem #1
• Find patterns, in large
datasets
time
# bytes
Poisson
indep.,
ident. distr
10/21/2005 23:09
MSU, 2005 C. Faloutsos 13
School of Computer Science
Carnegie Mellon
Problem #1
• Find patterns, in large
datasets
time
# bytes
Poisson
indep.,
ident. distr
MSU, 2005 C. Faloutsos 14
School of Computer Science
Carnegie Mellon
Problem #1
• Find patterns, in large
datasets
time
# bytes
Poisson
indep.,
ident. distr
Q: Then, how to generate
such bursty traffic?
MSU, 2005 C. Faloutsos 15
School of Computer Science
Carnegie Mellon
Overview
• Goals/ motivation: find patterns in large datasets:
– (A) Sensor data
– (B) network/graph data
• Solutions: self-similarity and power laws
• Discussion
MSU, 2005 C. Faloutsos 16
School of Computer Science
Carnegie Mellon
Problem #2 - network and
graph mining
• How does the Internet look like?
• How does the web look like?
• What constitutes a ‘normal’ social
network?
• What is the ‘network value’ of a
customer?
• which gene/species affects the others
the most?
10/21/2005 23:09
MSU, 2005 C. Faloutsos 17
School of Computer Science
Carnegie Mellon
Network and graph mining
Food Web
[Martinez ’91]
Protein Interactions
[genomebiology.com]Friendship Network
[Moody ’01]
Graphs are everywhere!
MSU, 2005 C. Faloutsos 18
School of Computer Science
Carnegie Mellon
Problem#2
Given a graph:
• which node to market-to /
defend / immunize first?
• Are there un-natural sub-
graphs? (eg., criminals’ rings)?
[from Lumeta: ISPs 6/1999]
MSU, 2005 C. Faloutsos 19
School of Computer Science
Carnegie Mellon
Solutions
• New tools: power laws, self-similarity and
‘fractals’ work, where traditional
assumptions fail
• Let’s see the details:
MSU, 2005 C. Faloutsos 20
School of Computer Science
Carnegie Mellon
Overview
• Goals/ motivation: find patterns in large
datasets:
– (A) Sensor data
– (B) network/graph data
• Solutions: self-similarity and power laws
• Discussion
10/21/2005 23:09
MSU, 2005 C. Faloutsos 21
School of Computer Science
Carnegie Mellon
What is a fractal?
= self-similar point set, e.g., Sierpinski triangle:
...zero area: (3/4)^inf
infinite length!
(4/3)^inf
Q: What is its dimensionality??
MSU, 2005 C. Faloutsos 22
School of Computer Science
Carnegie Mellon
What is a fractal?
= self-similar point set, e.g., Sierpinski triangle:
...zero area: (3/4)^inf
infinite length!
(4/3)^inf
Q: What is its dimensionality??
A: log3 / log2 = 1.58 (!?!)
MSU, 2005 C. Faloutsos 23
School of Computer Science
Carnegie Mellon
Intrinsic (‘fractal’) dimension
• Q: fractal dimension
of a line?
• Q: fd of a plane?
MSU, 2005 C. Faloutsos 24
School of Computer Science
Carnegie Mellon
Intrinsic (‘fractal’) dimension
• Q: fractal dimension of a line?
• A: nn ( <= r ) ~ r^1
(‘power law’: y=x^a)
• Q: fd of a plane?
• A: nn ( <= r ) ~ r^2
fd== slope of (log(nn) vs.. log(r) )
10/21/2005 23:09
MSU, 2005 C. Faloutsos 25
School of Computer Science
Carnegie Mellon
Sierpinsky triangle
log( r )
log(#pairs
within <=r )
1.58
== ‘correlation integral’
= CDF of pairwise distances
MSU, 2005 C. Faloutsos 26
School of Computer Science
Carnegie Mellon
Observations: Fractals <->
power laws
Closely related:
• fractals <=>
• self-similarity <=>
• scale-free <=>
• power laws ( y= xa ;
F=K r-2)
• (vs y=e-ax or y=xa+b)log( r )
log(#pairs
within <=r )
1.58
MSU, 2005 C. Faloutsos 27
School of Computer Science
Carnegie Mellon
Outline
• Problems
• Self-similarity and power laws
• Solutions to posed problems
• Discussion
MSU, 2005 C. Faloutsos 28
School of Computer Science
Carnegie Mellon
time
#bytes
Solution #1: traffic
• disk traces: self-similar: (also: [Leland+94])
• How to generate such traffic?
10/21/2005 23:09
MSU, 2005 C. Faloutsos 29
School of Computer Science
Carnegie Mellon
Solution #1: traffic
• disk traces (80-20 ‘law’) – ‘multifractals’
time
#bytes
20% 80%
MSU, 2005 C. Faloutsos 30
School of Computer Science
Carnegie Mellon
80-20 / multifractals20 80
MSU, 2005 C. Faloutsos 31
School of Computer Science
Carnegie Mellon
80-20 / multifractals
20• p ; (1-p) in general
• yes, there are
dependencies
80
MSU, 2005 C. Faloutsos 32
School of Computer Science
Carnegie Mellon
More on 80/20: PQRS
• Part of ‘self-* storage’ project
time
cylinder#
10/21/2005 23:09
MSU, 2005 C. Faloutsos 33
School of Computer Science
Carnegie Mellon
More on 80/20: PQRS
• Part of ‘self-* storage’ project
p q
r s
q
r s
MSU, 2005 C. Faloutsos 34
School of Computer Science
Carnegie Mellon
Overview
• Goals/ motivation: find patterns in large datasets:
– (A) Sensor data
– (B) network/graph data
• Solutions: self-similarity and power laws
– sensor/traffic data
– network/graph data
• Discussion
MSU, 2005 C. Faloutsos 35
School of Computer Science
Carnegie Mellon
Problem #2 - topology
How does the Internet look like? Any rules?
MSU, 2005 C. Faloutsos 36
School of Computer Science
Carnegie Mellon
Patterns?
• avg degree is, say 3.3
• pick a node at random
– guess its degree,
exactly (-> “mode”)
degree
count
avg: 3.3
10/21/2005 23:09
MSU, 2005 C. Faloutsos 37
School of Computer Science
Carnegie Mellon
Patterns?
• avg degree is, say 3.3
• pick a node at random
– guess its degree,
exactly (-> “mode”)
• A: 1!!
degree
count
avg: 3.3MSU, 2005 C. Faloutsos 38
School of Computer Science
Carnegie Mellon
Patterns?
• avg degree is, say 3.3
• pick a node at random - what is the degree you expect it to have?
• A: 1!!
• A’: very skewed distr.
• Corollary: the mean is meaningless!
• (and std -> infinity (!))
degree
count
avg: 3.3
MSU, 2005 C. Faloutsos 39
School of Computer Science
Carnegie Mellon
Solution#2: Rank exponent R• A1: Power law in the degree distribution
[SIGCOMM99]
internet domains
log(rank)
log(degree)
-0.82
att.com
ibm.com
MSU, 2005 C. Faloutsos 40
School of Computer Science
Carnegie Mellon
Solution#2’: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency matrix
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
10/21/2005 23:09
MSU, 2005 C. Faloutsos 41
School of Computer Science
Carnegie Mellon
Power laws - discussion
• do they hold, over time?
• do they hold on other graphs/domains?
MSU, 2005 C. Faloutsos 42
School of Computer Science
Carnegie Mellon
Power laws - discussion
• do they hold, over time?
• Yes! for multiple years [Siganos+]
• do they hold on other graphs/domains?
• Yes!
– web sites and links [Tomkins+], [Barabasi+]
– peer-to-peer graphs (gnutella-style)
– who-trusts-whom (epinions.com)
MSU, 2005 C. Faloutsos 43
School of Computer Science
Carnegie Mellon
Time Evolution: rank R
-1
-0.9
-0.8
-0.7
-0.6
-0.5
0 200 400 600 800
Instances in time: Nov'97 and on
Ra
nk
ex
po
ne
nt
• The rank exponent has not changed!
[Siganos+]
Domain
level
log(rank)
log(degree)
-0.82
att.com
ibm.com
MSU, 2005 C. Faloutsos 44
School of Computer Science
Carnegie Mellon
The Peer-to-Peer Topology
• Number of immediate peers (= degree), follows a
power-law
[Jovanovic+]
degree
count
10/21/2005 23:09
MSU, 2005 C. Faloutsos 45
School of Computer Science
Carnegie Mellon
epinions.com
• who-trusts-whom
[Richardson +
Domingos, KDD
2001]
(out) degree
count
MSU, 2005 C. Faloutsos 46
School of Computer Science
Carnegie Mellon
Why care about these patterns?
• better graph generators [BRITE, INET]
– for simulations
– extrapolations
• ‘abnormal’ graph and subgraph detection
MSU, 2005 C. Faloutsos 47
School of Computer Science
Carnegie Mellon
Recent discoveries [KDD’05]
• How do graphs evolve?
• degree-exponent seems constant - anything
else?
MSU, 2005 C. Faloutsos 48
School of Computer Science
Carnegie Mellon
Evolution of diameter?
• Prior analysis, on power-law-like graphs,
hints that
diameter ~ O(log(N)) or
diameter ~ O( log(log(N)))
• i.e.., slowly increasing with network size
• Q: What is happening, in reality?
10/21/2005 23:09
MSU, 2005 C. Faloutsos 49
School of Computer Science
Carnegie Mellon
Evolution of diameter?
• Prior analysis, on power-law-like graphs,
hints that
diameter ~ O(log(N)) or
diameter ~ O( log(log(N)))
• i.e.., slowly increasing with network size
• Q: What is happening, in reality?
• A: It shrinks(!!), towards a constant value
MSU, 2005 C. Faloutsos 50
School of Computer Science
Carnegie Mellon
Shrinking diameter
ArXiv physics papers
and their citations
[Leskovec+05a]
MSU, 2005 C. Faloutsos 51
School of Computer Science
Carnegie Mellon
Shrinking diameter
ArXiv: who wrote what
MSU, 2005 C. Faloutsos 52
School of Computer Science
Carnegie Mellon
Shrinking diameter
U.S. patents citing each
other
10/21/2005 23:09
MSU, 2005 C. Faloutsos 53
School of Computer Science
Carnegie Mellon
Shrinking diameter
Autonomous systems
MSU, 2005 C. Faloutsos 54
School of Computer Science
Carnegie Mellon
Temporal evolution of graphs
• N(t) nodes; E(t) edges at time t
• suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
MSU, 2005 C. Faloutsos 55
School of Computer Science
Carnegie Mellon
Temporal evolution of graphs
• N(t) nodes; E(t) edges at time t
• suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled!
x
MSU, 2005 C. Faloutsos 56
School of Computer Science
Carnegie Mellon
Temporal evolution of graphs
• A: over-doubled - but obeying:
E(t) ~ N(t)a for all t
where 1<a<2
a=1: constant avg degree
a=2: ~full clique
• Real graphs densify over time [Leskovec+05]
10/21/2005 23:09
MSU, 2005 C. Faloutsos 57
School of Computer Science
Carnegie Mellon
Temporal evolution of graphs
• A: over-doubled - but obeying:
E(t) ~ N(t)a for all t
• Identically:
log(E(t)) / log(N(t)) = constant for all t
MSU, 2005 C. Faloutsos 58
School of Computer Science
Carnegie Mellon
Densification Power Law
ArXiv: Physics papers
and their citations
1.69
N(t)
E(t)
MSU, 2005 C. Faloutsos 59
School of Computer Science
Carnegie Mellon
Densification Power Law
U.S. Patents, citing each
other
1.66
N(t)
E(t)
MSU, 2005 C. Faloutsos 60
School of Computer Science
Carnegie Mellon
Densification Power Law
Autonomous Systems
1.18
N(t)
E(t)
10/21/2005 23:09
MSU, 2005 C. Faloutsos 61
School of Computer Science
Carnegie Mellon
Densification Power Law
ArXiv: who wrote what
1.15
N(t)
E(t)
MSU, 2005 C. Faloutsos 62
School of Computer Science
Carnegie Mellon
Outline
• problems
• Fractals
• Solutions
• Discussion
– what else can they solve?
– how frequent are fractals?
MSU, 2005 C. Faloutsos 63
School of Computer Science
Carnegie Mellon
What else can they solve?
• separability [KDD’02]
• forecasting [CIKM’02]
• dimensionality reduction [SBBD’00]
• non-linear axis scaling [KDD’02]
• disk trace modeling [PEVA’02]
• selectivity of spatial/multimedia queries
[PODS’94, VLDB’95, ICDE’00]
• ...
MSU, 2005 C. Faloutsos 64
School of Computer Science
Carnegie Mellon
Storyboard
• Search results
(ranked)
Collage with maps,
common phrases,
named entities and
dynamic query sliders
• Query (6TB of
data)
Full Content Indexing, Search and Retrieval from Digital Video Archives
www.informedia.cs.cmu.edu
10/21/2005 23:09
MSU, 2005 C. Faloutsos 65
School of Computer Science
Carnegie Mellon
Index trees
• OMNI trees [ICDE’01, w/ Roberto Filho,
Caetano and Agma Traina]
• Slim-trees [EDBT’00, w/ C+A Traina,
Seeger]
• ...
MSU, 2005 C. Faloutsos 66
School of Computer Science
Carnegie Mellon
What else can they solve?
• separability [KDD’02]
• forecasting [CIKM’02]
• dimensionality reduction [SBBD’00]
• non-linear axis scaling [KDD’02]
• disk trace modeling [PEVA’02]
• selectivity of spatial/multimedia queries
[PODS’94, VLDB’95, ICDE’00]
• ...
MSU, 2005 C. Faloutsos 67
School of Computer Science
Carnegie Mellon
Problem #3 - spatial d.m.
Galaxies (Sloan Digital Sky Survey w/ B.
Nichol) - ‘spiral’ and ‘elliptical’
galaxies
- patterns? (not Gaussian; not
uniform)
-attraction/repulsion?
- separability??
MSU, 2005 C. Faloutsos 68
School of Computer Science
Carnegie Mellon
Solution#3: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
CORRELATION INTEGRAL!
10/21/2005 23:09
MSU, 2005 C. Faloutsos 69
School of Computer Science
Carnegie Mellon
Solution#3: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
[w/ Seeger, Traina, Traina, SIGMOD00]
MSU, 2005 C. Faloutsos 70
School of Computer Science
Carnegie Mellon
Solution#3: spatial d.m.
r1r2
r1
r2
Heuristic on choosing # of
clusters
MSU, 2005 C. Faloutsos 71
School of Computer Science
Carnegie Mellon
Solution#3: spatial d.m.
log(r)
log(#pairs within <=r )
spi-spi
spi-ell
ell-ell
- 1.8 slope
- plateau!
- repulsion!
MSU, 2005 C. Faloutsos 72
School of Computer Science
Carnegie Mellon
What else can they solve?
• separability [KDD’02]
• forecasting [CIKM’02]
• dimensionality reduction [SBBD’00]
• non-linear axis scaling [KDD’02]
• disk trace modeling [PEVA’02]
• selectivity of spatial/multimedia queries
[PODS’94, VLDB’95, ICDE’00]
• ...
10/21/2005 23:09
MSU, 2005 C. Faloutsos 73
School of Computer Science
Carnegie Mellon
Problem#4: dim. reduction
• given attributes x1, ... xn
– possibly, non-linearly correlated
• drop the useless ones mpg
cc
MSU, 2005 C. Faloutsos 74
School of Computer Science
Carnegie Mellon
Problem#4: dim. reduction
• given attributes x1, ... xn
– possibly, non-linearly correlated
• drop the useless ones
(Q: why?
A: to avoid the ‘dimensionality curse’)
Solution: keep on dropping attributes, until the f.d. changes! [w/ Traina+, SBBD’00]
mpg
cc
MSU, 2005 C. Faloutsos 75
School of Computer Science
Carnegie Mellon
Outline
• problems
• Fractals
• Solutions
• Discussion
– what else can they solve?
– how frequent are fractals?
MSU, 2005 C. Faloutsos 76
School of Computer Science
Carnegie Mellon
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
• <and many-many more! see [Mandelbrot]>
10/21/2005 23:09
MSU, 2005 C. Faloutsos 77
School of Computer Science
Carnegie Mellon
Fractals: Brain scans
• brain-scans
octree levels
Log(#octants)
2.63 =
fd
MSU, 2005 C. Faloutsos 78
School of Computer Science
Carnegie Mellon
fMRI brain scans
• Center for Cognitive Brain Imaging @ CMU
• Tom Mitchell, Marcel Just, ++
fMRI Goal: human brain function
Which voxels are active,
for a given cognitive task?
MSU, 2005 C. Faloutsos 79
School of Computer Science
Carnegie Mellon
More fractals
• periphery of malignant tumors: ~1.5
• benign: ~1.3
• [Burdet+]
MSU, 2005 C. Faloutsos 80
School of Computer Science
Carnegie Mellon
More fractals:
• cardiovascular system: 3 (!) lungs: ~2.9
10/21/2005 23:09
MSU, 2005 C. Faloutsos 81
School of Computer Science
Carnegie Mellon
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
MSU, 2005 C. Faloutsos 82
School of Computer Science
Carnegie Mellon
More fractals:
• Coastlines: 1.2-1.58
1 1.1
1.3
MSU, 2005 C. Faloutsos 83
School of Computer Science
Carnegie Mellon
MSU, 2005 C. Faloutsos 84
School of Computer Science
Carnegie Mellon
More fractals:
• the fractal dimension for the Amazon river
is 1.85 (Nile: 1.4)[ems.gphys.unc.edu/nonlinear/fractals/examples.html]
10/21/2005 23:09
MSU, 2005 C. Faloutsos 85
School of Computer Science
Carnegie Mellon
More fractals:
• the fractal dimension for the Amazon river
is 1.85 (Nile: 1.4)[ems.gphys.unc.edu/nonlinear/fractals/examples.html]
MSU, 2005 C. Faloutsos 86
School of Computer Science
Carnegie Mellon
Cross-roads of
Montgomery county:
•any rules?
GIS points
MSU, 2005 C. Faloutsos 87
School of Computer Science
Carnegie Mellon
GIS
A: self-similarity:
• intrinsic dim. = 1.51
log( r )
log(#pairs(within <= r))
1.51
MSU, 2005 C. Faloutsos 88
School of Computer Science
Carnegie Mellon
Examples:LB county
• Long Beach county of CA (road end-points)
1.7
log(r)
log(#pairs)
10/21/2005 23:09
MSU, 2005 C. Faloutsos 89
School of Computer Science
Carnegie Mellon
More power laws: areas –
Korcak’s law
Scandinavian lakes
Any pattern?
MSU, 2005 C. Faloutsos 90
School of Computer Science
Carnegie Mellon
More power laws: areas –
Korcak’s law
Scandinavian lakes
area vs
complementary
cumulative count
(log-log axes)
log(count( >= area))
log(area)
MSU, 2005 C. Faloutsos 91
School of Computer Science
Carnegie Mellon
More power laws: Korcak
Japan islands;
area vs cumulative
count (log-log axes) log(area)
log(count( >= area))
MSU, 2005 C. Faloutsos 92
School of Computer Science
Carnegie Mellon
More power laws
• Energy of earthquakes (Gutenberg-Richter
law) [simscience.org]
log(count)
Magnitude = log(energy)day
Energy
released
10/21/2005 23:09
MSU, 2005 C. Faloutsos 93
School of Computer Science
Carnegie Mellon
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
MSU, 2005 C. Faloutsos 94
School of Computer Science
Carnegie Mellon
A famous power law: Zipf’s
law
• Bible - rank vs.
frequency (log-log)
log(rank)
log(freq)
“a”
“the”
“Rank/frequency plot”
MSU, 2005 C. Faloutsos 95
School of Computer Science
Carnegie Mellon
TELCO data
# of service units
count of
customers
‘best customer’
MSU, 2005 C. Faloutsos 96
School of Computer Science
Carnegie Mellon
SALES data – store#96
# units sold
count of
products
“aspirin”
10/21/2005 23:09
MSU, 2005 C. Faloutsos 97
School of Computer Science
Carnegie Mellon
Olympic medals (Sidney’00,
Athens’04):
log( rank)
log(#medals)
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
athens
sidney
MSU, 2005 C. Faloutsos 98
School of Computer Science
Carnegie Mellon
Olympic medals (Sidney’00,
Athens’04):
log( rank)
log(#medals)
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2
athens
sidney
MSU, 2005 C. Faloutsos 99
School of Computer Science
Carnegie Mellon
Even more power laws:
• Income distribution (Pareto’s law)
• size of firms
• publication counts (Lotka’s law)
MSU, 2005 C. Faloutsos 100
School of Computer Science
Carnegie Mellon
Even more power laws:
library science (Lotka’s law of publication
count); and citation counts:
(citeseer.nj.nec.com 6/2001)
log(#citations)
log(count)
Ullman
10/21/2005 23:09
MSU, 2005 C. Faloutsos 101
School of Computer Science
Carnegie Mellon
Even more power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(freq)
log(count)
Zipf
“yahoo.com”
MSU, 2005 C. Faloutsos 102
School of Computer Science
Carnegie Mellon
Fractals & power laws:
appear in numerous settings:
• medical
• geographical / geological
• social
• computer-system related
MSU, 2005 C. Faloutsos 103
School of Computer Science
Carnegie Mellon
Power laws, cont’d
• In- and out-degree distribution of web sites
[Barabasi], [IBM-CLEVER]
log indegree
- log(freq)
from [Ravi Kumar,
Prabhakar Raghavan,
Sridhar Rajagopalan,
Andrew Tomkins ]
MSU, 2005 C. Faloutsos 104
School of Computer Science
Carnegie Mellon
Power laws, cont’d
• In- and out-degree distribution of web sites
[Barabasi], [IBM-CLEVER]
log indegree
log(freq)
from [Ravi Kumar,
Prabhakar Raghavan,
Sridhar Rajagopalan,
Andrew Tomkins ]
10/21/2005 23:09
MSU, 2005 C. Faloutsos 105
School of Computer Science
Carnegie Mellon
Power laws, cont’d
• In- and out-degree distribution of web sites
[Barabasi], [IBM-CLEVER]
• length of file transfers [Crovella+Bestavros
‘96]
• duration of UNIX jobs
MSU, 2005 C. Faloutsos 106
School of Computer Science
Carnegie Mellon
Conclusions
• Fascinating problems in Data Mining: find
patterns in
– sensors/streams
– graphs/networks
MSU, 2005 C. Faloutsos 108
School of Computer Science
Carnegie Mellon
Resources
• Manfred Schroeder “Chaos, Fractals and
Power Laws”, 1991
10/21/2005 23:09
MSU, 2005 C. Faloutsos 109
School of Computer Science
Carnegie Mellon
References
• [vldb95] Alberto Belussi and Christos Faloutsos,
Estimating the Selectivity of Spatial Queries Using
the `Correlation' Fractal Dimension Proc. of
VLDB, p. 299-310, 1995
• M. Crovella and A. Bestavros, Self similarity in
World wide web traffic: Evidence and possible
causes , SIGMETRICS ’96.
MSU, 2005 C. Faloutsos 110
School of Computer Science
Carnegie Mellon
References
• J. Considine, F. Li, G. Kollios and J. Byers,
Approximate Aggregation Techniques for Sensor
Databases (ICDE’04, best paper award).
• [pods94] Christos Faloutsos and Ibrahim Kamel,
Beyond Uniformity and Independence: Analysis of
R-trees Using the Concept of Fractal Dimension,
PODS, Minneapolis, MN, May 24-26, 1994, pp.
4-13
MSU, 2005 C. Faloutsos 111
School of Computer Science
Carnegie Mellon
References
• [vldb96] Christos Faloutsos, Yossi Matias and Avi
Silberschatz, Modeling Skewed Distributions
Using Multifractals and the `80-20 Law’ Conf. on
Very Large Data Bases (VLDB), Bombay, India,
Sept. 1996.
• [sigmod2000] Christos Faloutsos, Bernhard
Seeger, Agma J. M. Traina and Caetano Traina Jr.,
Spatial Join Selectivity Using Power Laws,
SIGMOD 2000
MSU, 2005 C. Faloutsos 112
School of Computer Science
Carnegie Mellon
References
• [vldb96] Christos Faloutsos and Volker Gaede
Analysis of the Z-Ordering Method Using the
Hausdorff Fractal Dimension VLD, Bombay,
India, Sept. 1996
• [sigcomm99] Michalis Faloutsos, Petros Faloutsos
and Christos Faloutsos, What does the Internet
look like? Empirical Laws of the Internet
Topology, SIGCOMM 1999
10/21/2005 23:09
MSU, 2005 C. Faloutsos 113
School of Computer Science
Carnegie Mellon
References
• [Leskovec 05] Jure Leskovec, Jon M. Kleinberg,
Christos Faloutsos: Graphs over time:
densification laws, shrinking diameters and
possible explanations. KDD 2005: 177-187
MSU, 2005 C. Faloutsos 114
School of Computer Science
Carnegie Mellon
References
• [ieeeTN94] W. E. Leland, M.S. Taqqu, W.
Willinger, D.V. Wilson, On the Self-Similar
Nature of Ethernet Traffic, IEEE Transactions on
Networking, 2, 1, pp 1-15, Feb. 1994.
• [brite] Alberto Medina, Anukool Lakhina, Ibrahim
Matta, and John Byers. BRITE: An Approach to
Universal Topology Generation. MASCOTS '01
MSU, 2005 C. Faloutsos 115
School of Computer Science
Carnegie Mellon
References
• [icde99] Guido Proietti and Christos Faloutsos,
I/O complexity for range queries on region data
stored using an R-tree (ICDE’99)
• Stan Sclaroff, Leonid Taycher and Marco La
Cascia , "ImageRover: A content-based image
browser for the world wide web" Proc. IEEE
Workshop on Content-based Access of Image and
Video Libraries, pp 2-9, 1997.
MSU, 2005 C. Faloutsos 116
School of Computer Science
Carnegie Mellon
References
• [kdd2001] Agma J. M. Traina, Caetano Traina Jr.,
Spiros Papadimitriou and Christos Faloutsos: Tri-
plots: Scalable Tools for Multidimensional Data
Mining, KDD 2001, San Francisco, CA.
10/21/2005 23:09
MSU, 2005 C. Faloutsos 117
School of Computer Science
Carnegie Mellon
Thank you!
Contact info:
christos <at> cs.cmu.edu
www. cs.cmu.edu /~christos
(w/ papers, datasets, code for fractal dimension
estimation, etc)
10/21/2005 23:09