1 Special Topics on Graph Algorithms Finding the Diameter in Real-World Graphs Experimentally Turning a Lower Bound into an Upper Bound F96943167 施信瑋 F97943070

1

Special Topics on Graph Algorithms

Finding the Diameter in Real-World GraphsExperimentally Turning a Lower Bound into an

Upper Bound

F96943167 施信瑋F97943070 方劭云R98943086 莊舜翔R98943090 曹蕙芳

R98943088 周邦彥R98921072 金蘊R99921040 林國偉R99942061 葉書豪

Outline

2

Introduction

Previous Work

Finding the Diameter in Real-World Graphs

Conclusion and Future Work

Other Related Topics

R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

F96943167 施信瑋F97943070 方劭云

R98921072 金蘊

Diameter The length of the "longest shortest path" between any two vertices in a graph or a tree

Given a connected graph G = (V,E) with n=|V| vertices and m=|E| edges the diameter D is Max d(u,v) for u,v in V, where d(u,v) denotes the distance between node u and v

1

5

2 3

3

2

3

A Tree, D = 13

1

5

2 3

3

2

3

24

35

A Graph, D = 9

3

Diameter of a Tree The diameter of a tree can be computed by applying double-

sweep algorithm: 1. Choose a random vertex r, run a BFS at r, and find a vertex

a farthest from r 2. Run a BFS at a and find a vertex b farthest from a 3. Return D = d(a,b)

1

5

2 3

3

2

3

0

86

53

23

3

r

a

1

5

2 3

3

2

3

8

08

1311

10 11

5

a

b

4

Diameter of a Graph Double-sweep algorithm might not correctly compute the

diameter of a graph It provides a lower bound instead

0

32

33

7 6

5

7

85

65

0 4

6

1

5

2 3

3

2

3

24

35

r

1

5

23

2

3

24

35

1

5

23

2

3

24

35

a a

b

D = 9

5

Outline

6

Introduction

Previous Work




R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

F96943167 施信瑋F97943070 方劭云

R98921072 金蘊

Naïve Algorithm Perform n breadth-first searches (BFS) from each vertex

to obtain distance matrix of the graph Θ(n(n+m)) time and Θ(m) space

By using matrix multiplication, the distance matrix can be computed in O(M(n)logn) time and Θ(n2) space [Seidel, ACM STC’92]

M(n): the complexity for matrix multiplication involving small integers only (O(n2.376))

Is too slow for massive graphs and has a prohibitive space cost

7

All Pairs Shortest Path Compute the distances between all pairs of vertices

without resorting to matrix products [Feder, ACM STC’91]: Θ(n3 / logn) time and O(n2) space [Chan, ACM-SIAM’06]: O(n2(loglogn)2 / logn) time and O(n2) space

Still too slow and space consuming for massive graphs

8

All Pairs Almost Shortest Path (1/2) Compute almost shortest paths between all pairs of

vertices [Dor, ECCC’97] Additive error 2 Treat high-degree vertices and low-degree vertices separately

9

All Pairs Almost Shortest Path (2/2) Additive error 2: apasp2

O(min(n3/2m1/2, n7/3)logn) time and Θ(n2) space

Still too expensive

u vw

w’

u vw

w’

10

Self-checking Heuristics Too expensive to obtain the exact value or accurate

estimations of the diameter for massive graphs

Empirically establish some lower and upper bounds by executing a suitable small number of BFS

L ≦ D ≦ U Obtain the actual value of D for G when L = U

Self-checking heuristics

11

Self-checking Heuristics No guarantee of success for every feasible input, BUT

1) It requires few BFSes in practice, and thus its complexity is linear [Magnien, JEA’09]

2) An empirical upper bound is possible 3) Large graphs can be analyzed

since BFS has a good external-memory implementation [Mayer, AESA’02] and works on graphs stored in compressed format [Vigna, IWWWC’04]

12

A Comparing Work “Fast Computation of Empirically Tight Bounds for the

Diameter of Massive Graphs” [Magnien, JEA’09] Various bounds to confine the solution range

Trivial bounds Double sweep lower bound Tree upper bound

Iterative algorithm to obtain the actual diameter

13

Trivial Bounds The eccentricity of any vertex v gives trivial bounds of the

diameter: ecc(v) ≤ D ≤ 2•ecc(v)

Trivial bounds can be computed in Θ(m) space and time, where m is the number of edges in the graph

D ≤ 2•ecc(v) If D > 2•ecc(v), then max(ecc(v)) > 2•ecc(v) We can choose a center point in the diameter that contradicts the

derived inequality Therefore, D ≤ 2•ecc(v)

14

Double Sweep Lower Bound On chordal graphs, AT-free graphs, and tree graphs, if a

vertex v is chosen such that d(u, v) = ecc(u) for a vertex u, then D = ecc(u) (i.e. v is among the vertices which are at maximal distance from u) [Corneil’01, Handler’73]

The diameter may therefore be computed by a BFS from any node u and then a BFS from a node at maximal distance from u, thus in Θ(m) space and time, where m is the number of edges.

Generally, the value obtained in this way may different from the diameter, but still better than trivial lower bounds

15

Double Sweep Lower Bound: An Example

0

11

21

2 2

2

2

21

21

0 1

2

D = 2

D = 4

actual diameter

16

Tree Upper Bound The diameter of any spanning connected subgraph of G

is larger than or equal to the diameter of G Tree diameter can be obtain in Θ(m) time and space

[Handler’73], where m is the number of edges in G Spanning trees of G, are good candidates for obtaining an upper

bound

A tree upper bound is the diameter of a BFS tree from a vertex

It is always better than the corresponding trivial upper bound

17

Tree Upper Bound: An Example

2

31

04

5 5

4

1

02

31

2 2

1

D’ = 5

D = 4

actual diameter

18

Tighten the Bounds Iteratively choosing different initial vertices for tighter

bounds (for tree upper bounds) Random tree upper bound (rtub)

Iterate the tree upper bound from random vertices Highest degree tree upper bound (hdtub)

Consider vertices in decreasing order of degrees when iterating the algorithm

19

The Iterative Algorithm Iterate the double sweep lower bound and highest degree

tree upper bound until the difference between the best bounds obtained is lower than or equal to a given threshold value

Multiple choices for this threshold value Depending factors: the graph considered, the desired quality of

the bounds, or even set the threshold to be a given precision (e.g. D’-D/D<p)

All heuristics have a Θ(m) time complexity, and a Θ(m+n) space complexity.

Does the tree upper bound eventually converge to the exact diameter?

20

Possibly Unmatching Upper Bound No guarantee of obtaining the exact diameter as all the

tree upper bounds may be strictly larger than D E.g. if G is a cycle of n vertices, its diameter is n/2 and the tree

upper bound is n-1 which ever vertex one starts from

Is there an algorithm that provides more matching upper bounds?

D = 3 D’ = 5

21

Outline

22

Introduction

Previous Work




R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

F96943167 施信瑋F97943070 方劭云

R98921072 金蘊

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

The Fringe Algorithm Fringe method is used to improve the upper bound U and

possibly match the lower bound L obtained by the double sweep method

23

The Fringe Algorithm An unweighted, undirected and connected graph G=( V,

E ) For any vertex

Tu denotes an unordered BFS-tree

Eccentricity ecc(u) is the height of Tu

=> 2* ecc(u) diam(G)≧

24

The Fringe Algorithm Proof 2* ecc(u) diam(G) ≧ => ecc(u) diam(G)/2≧ 1) if ecc(u) < diam(G)/2, diam(G) ≡d(a,b)

d(u,v) < diam(G)/2, for all

then d(u,a)<diam(G)/2

d(u,b)<diam(G)/2

=> d(u,a)+d(u,b)< d(a,b)

contradiction!!!

∴ 2* ecc(u) diam(G) ≧

diameter

a

b

udiamete

r

25

U

The Fringe Algorithm

Tu denotes an unordered BFS-tree

Tu is a subgraph of G

, , ,

=>

let , so

diam (Tu )

26

The Fringe Algorithm The fringe of u, denote F(u), as the set of vertices

such that

U

|F(U)| = 3

27


U

A B CA B C BFS(A)

=>ecc(A) BFS(B)

=>ecc(B) BFS(C)

=>ecc(C)

B(u) = max {ecc(A), ecc(B), ecc(C)}

28

The Fringe Algorithm The fringe of u, denote F(u), as the set of vertices

such that

29


Lemma. U(u) D, where D is the diameter of G ≧

30

The Fringe Algorithm Case 1 : |F(u)| = 1 => Case 2 : |F(u)| > 1 , B(u)=2ecc(u)

=> Case 3 : |F(u)| > 1 , B(u)=2ecc(u)-1

=> Case 4 : |F(u)| > 1 , B(u)<2ecc(u)-1

=>

31

The Fringe Algorithm Case 1 : |F(u)| = 1

U

32

The Fringe Algorithm Case 2 : |F(u)| > 1 , B(u)=2ecc(u) ecc(u) = 3 , diam(Tu) = 6

diameter upper bound = 6 B(u) provides lower bound

=> if B(u) = 2 * ecc(u)

∴ diameter = diam(Tu)

U

33

The Fringe Algorithm Case 3 : |F(u)| > 1 , B(u)=2ecc(u)-1

Non-leave node

upper bound = 2ecc(u)-2 Leave node

upper bound = 2ecc(u) if B(u) = 2ecc(u)-1

=> diameter = 2ecc(u)-1

U

ab

d(a,u) ecc(u)-1≦d(b,u) ecc(u)-1≦

34

The Fringe Algorithm Case 4 : |F(u)| > 1 , B(u)<2ecc(u)-1

Non-leave node

upper bound = 2ecc(u)-2 Leave node

upper bound = 2ecc(u) if B(u) < 2ecc(u)-1

=> diameter 2ecc(u)-2≦

U

ab

d(a,u) ecc(u)-1≦d(b,u) ecc(u)-1≦

35

The Fringe Algorithm The fringe algorithm correctly computes an upper bound

for the diameter of the input graph G, using at most |F(u)|+3 BFS.

36

The Fringe Algorithm Let r,a,and b be the vertices identified by double

sweep(using two BFSes) Find the vertex u that is halfway along the path

connecting a and b inside the BFS-tree Ta

Compute the BFS-tree Tu and its eccentricity ecc(u)

If |F(u)|>1,find the BFS-tree Tz for each and compute B(u)

If B(u)=2ecc(u)-1,return 2ecc(u)-1 If B(u)<2ecc(u)-1,return 2ecc(u)-2

Return the diameter(Tu)

37

Example(1/2)

x1 … xp

y1

row=3

column=6

Diameter=6

When number of P is large !!We choose X1 as r

A B

* DSx1->A = 3x1->B = 3x1->y1 = 4

Choose y1 as a

B choose A ,B, x1 as b

y1->A =4y1->B =4 y1->x1 =4

diameter = 4 Wrong !

!!

38

Example(2/2)

x1 … xp

y1

row=3

column=6

•FringeI. Use DS to find a and b x1 as a y1 as b

II. Find a vertex u that is halfway along the path connecting a and b

Case 1 :III. ecc(u) = 4 |F(u)|>1 B(u)=6

Case 2 :IV.B(u)=2ecc(u) 6 = (2*3) return 2ecc(u) diameter = 6

Case 2 :III. ecc(u) = 3 |F(u)|>1 B(u)=6

Case 1 :IV.B(u)<2ecc(u)-1 6 < (2*4) -1 return 2ecc(u)-2 diameter = 6

u

39

A Bad Case for Fringe

r

a

40


a

b

u

41


F(u)

u

Ecc(u) = 3 B(u) = 3

B(u) < 2ecc(u) – 1(5) return 2ecc(u) – 2(4) Real diameter = 3 ∴ Fringe fail !!!

42

Experimental Results (1/2)

ApproachesResults (44 in total)

Matches Failures

fub 37 7

mtub 13 31

hdtub 10 34

rtub 7 37

Implemented in C on a 2.93Ghz Linux workstation with 24 GB memory

44 real-word graphs are tested each with 4000 ~ 50 million nodes, 20000 ~ 3000 million edges

Real diameter is found by exhaustive search to check the obtained upper bounds

43

Experimental Results (2/2)

Benchmarks D fub mtub hdtub rtub

CAH2 18 20 20 20 20

CITP 26 28 30 29 31

DBLP 22 24 24 24 25

P2PG 11 14 15 14 15

ROA1 865 987 987 1047 988

ROA2 794 803 803 873 832

ROA3 1064 1079 1079 1166 1128

The proposed method generates the tightest upper bound for the 7 mismatches, compared with the approaches in previous work

44

Outline

45

Introduction

Previous Work




R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

F96943167 施信瑋F97943070 方劭云

R98921072 金蘊

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

Finding the Diameter on Weighted Graphs Consider a large complete graph with edge weight be 1

except for only one edge The eccentricities of most points are 1 However, the diameter of the graph is larger than 1

The fringe algorithm may not efficiently find tight diameter bounds for weighted graphs

46

1

11.5

1 1

1

47

Minimum Diameter Spanning Trees Minimum diameter spanning tree (MDST) problem

Given a graph G=(V,E) with edge weight

Find a spanning tree T for G such that

is minimized

EeRew ,)(

peTp

ew )(maxpath simple

1

1

22

3

42

2

3

1

1

2

Diameter=5 Diameter=3MDST

Outline

48

Introduction

Previous Work




Geometric MDST

MDST

R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

F96943167 施信瑋

F97943070 方劭云

R98921072 金蘊

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

Geometric MDST Geometric MDST (GMDST)

Given a set of n points in the Euclidean space, find a spanning tree connecting these points so that the length of its diameter is minimum

GMDST corresponds to finding an MDST on a complete graph with edge weight being the Euclidean distance between two points

49

Monopolar and Dipolar A spanning tree is said to be monopolar if there exists a

point (called monopole) s.t. all remaining points are connected to it

A spanning tree is said to be dipolar if there exists two points (called dipole) s.t. all remaining points are connected to one of the two points in the dipole

50

A monopolarspanning tree

A dipolarspanning tree

monopole dipole

GMDST with a Simple Topology Theorem

There exists a GMDST of a set S of n points which is either monopolar (n 3) or dipolar (≧ n 4) ≧

51

All monopolar spanning trees of the 4 points

4 dipolar spanning trees of the 4 points

Center Edge

An edge (ai-1,ai) is a center edge of a path P=(a0,a1,…,ak) if

is minimized

52

)},(),,(max{ 10 kiPiP aadistaadist

a0 a1

aiai-1 ak-1

ak

dist(a0, ai-1) dist(ai, ak)

Lemma Lemma

Let (ai-1,ai) be a center edge of a path P=(a0,a1,…,ak), then:

(1) and

(2)

53

),(),( 110 kiPiP aadistaadist

),(),( 0 iPkiP aadistaadist

dist(a0, ai-1) dist(ai-1, ak)

a0a1

aiai-1 ak-1

ak

ai-2

a0a1

aiai-1ak-1

ak

ai-2

Otherwise, the center edge is not (ai-1, ai)

A B If A > B:max{ A, B-ei-1 } > max{ A-ei-2, B }

ei-1ei-2

ei-1ei-2

Proof of the Theorem Theorem There exists a GMDST of a set S of n points

which is either monopolar or dipolar Proof

Case 1 Given any optimal GMDST T with a diameter composed of only two edges, i.e., D(T) = (a0, a1, a2) of size DT, a monopolar spanning tree T’ can be constructed with the same diameter

54

a0

a1

a2

a0

a1

a2

Optimal T T’

T

TT

T

D

aaaa

avdistaudist

avauvudist

,,

),(),(

,,),(

2110

11

11

u

v

u v

Proof of the Theorem (cont’d) Case 2 Given any optimal GMDST T with diameter D(T) = (a0,a1,

…,ak) of size DT, k 3. A dipolar spanning tree ≧ T’’ can be constructed with the same diameter

Let (ai-1,ai) be the center edge of D(T)

Connect all points in the subtree Ti-1 to ai-1, and connect all points in the subtree Ti to ai

55

a0

a1

ak

ak-1aiai-1a0

a1

ak

ak-1aiai-1

Ti-1 Ti

Center edge

T’’Optimal T

Proof of the Theorem (cont’d) For any point pair u and v, if the two points are in different

subtrees, their distance is obviously less than DT

If u and v are in the same subtree

56

a0

a1

ak

ak-1aiai-1a0

a1

ak

ak-1aiai-1

Ti-1 Ti

vu u v

u u

T

iTiT

kiTkiT

iTiT

iiT

D

avdistaadist

aadistaadist

avdistaudist

avauvudist

),(),(

),(),(

),(),(

,,),(

0

T’’Optimal T

Finding a Geometric MDST Theorem There exists a GMDST of a set S of n points

which is either monopolar or dipolar By enumerating all monopolar and dipolar spanning trees

of a set of given points, an optimal GMDST can be found The enumeration process can be done in θ(n3)

57

Outline

58

Introduction

Previous Work




Geometric MDST

MDST

R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

F96943167 施信瑋

F97943070 方劭云

R98921072 金蘊

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

[Ho et al., SIAMJ’91] solved the geometric MDST in O(n3) Actually, the general MDST problem is identical to the

absolute 1-center problem (A1CP) The absolute 1-center problem

5959

Introduction

Find x=x* such that F(x) is minimized

),( EVG nV || mE ||, ,For

),(max)( vxdxF GVvG Let

Theorem

SPT(x*) is minimum diameter spanning treeSPT(x*) is minimum diameter spanning treex* is absolute 1-center(continuum set)

x* is absolute 1-center(continuum set)

6060

Equivalence of A1CP and MDST

Proof idea Considering metric space solution with continuum set SPT(y*) Diameter of SPT(x*) equals to that of SPT(y*) As SPT(y*) is minimum, the MDST is solved

Continuum set Let the graph be rectifiable Refer interior points on an edge by their distances from the two

nodes

6161

The Proof of Equivalence

10

3

7

5

5

For given tree T, diameter D(T) equals to 2‧FT(y*) y* is the absolute 1-center of T

6262

Property of Continuum Set

diameterequal distance

Assume that z* is the absolute 1-center of G By following the property of continuum set,

Since the tree is the shortest path tree rooted at z*,

Since z* is the absolute 1-center of G,

For any tree Ti rooted at u,

It implies that, for any spanning tree Tj ,

6363

Proof

)*,(max2*))(( *)( vzdzSPTD zSPTVv

)*,(max2)*,(max2 *)( vzdvzd GVvzSPTVv

),(max2)*,(max2 , vudvzdVu GVvGVv

),(max2),(max2 )( vudvud uTVvGVv i

)()*,(max2 jGVv TDvzd

Conclusion The concepts of monopolar and dipolar [Ho et al.,

SIAMJ’91] are exactly the same as the proved result

By using all pairs shortest distance [Fredman and Tarjan, JACM’87], the A1CP can be solved in O(mn + n2 log n)

monopole dipole

absolute 1-center

64

Outline

65

Introduction

Previous Work




R98943086 莊舜翔

R98943090 曹蕙芳R98943088 周邦彥

F96943167 施信瑋F97943070 方劭云

R98921072 金蘊

R99921040 林國偉R99942061 葉書豪R98943086 莊舜翔

Conclusion In today’s presentation, we have

Introduce the difference between finding the diameter on a tree and finding the diameter on a general graph

Give some naïve algorithms for finding the diameter on a graph Present the double sweep algorithm introduced in the previous

work Present the fringe algorithm which extends the double sweep

algorithm Compare the double sweep algorithm and the fringe algorithm

66

Conclusion (cont’d) Besides, we further

Identify the difference between finding the diameter on an unweighted graph and finding the diameter on a weighted graph

Present two algorithms that find minimum diameter spanning trees on weighted graphs

67

Future Work Another topic related to the design methodology for

directed graphs with minimum diameter is interesting as well

“Design to minimize diameter on building-block network”, Makoto Imase and Masaki Itoh

“A design for directed graphs with minimum diamter”, Makoto Imase and Masaki Itoh

Given # nodes and the upper bounds of in- and out-degree, design a directed graph s.t. the diameter is minimized

68

n = 9, d = 2

Future Work (cont’d) How to find the diameter (or find the tight upper and lower

bounds) of a weighted graph is still an opening problem

69

1

11.5

1 1

1

1

5

2 3

3

2

3

24

35

Diameter = 1.5 Diameter = 9

Documents

1 Special Topics on Graph Algorithms Finding the Diameter in Real-World Graphs Experimentally Turning a Lower Bound into an Upper Bound F96943167 施信瑋 F97943070