Minimum Spanning Treesalgo.epfl.ch › _media › en › courses › 2011-2012 › algorithm... · Definition: A spanning tree of G is a connected acyclic subgraph T of G with the

Minimum Spanning Trees

AlgorithmiqueFall semester 2011/12

Acknowledgment: Slides modeled after the course CO226 at Princeton

Given: Undirected+connected graph G with positive edge weightsDefinition: A spanning tree of G is a connected acyclic subgraph T of G with the same set of vertices as G.Goal: find a minimum weight spanning tree of G. (Minimum spanning tree, or MST)

4

12

10

7

19 11

9 5

2

7

2314

Graph G

4


4

12

10

7

19 11

9 5

2

7

2314

Acyclic, but not spanning

4


4

12

10

7

19 11

9 5

2

7

2314

Spanning, acyclic, but not connected

4


Spanning, connected, but not acyclic

4

12

10

7

19 11

9 5

2

7

2314

4


Spanning tree of cost 4+7+11+10+4+7+5 = 48

4

12

10

7

19 11

9 5

2

7

2314

4

Examples

A multinational company wants to lease communication lines between its various locations.

Example 1: Communication Networks





5

400

80

90

10

1210

19

20

19

10

10

100

20

90

18

25

50

43

90

90

200

20

10

122

19

20

18

10

540

100

200

10

120

20

120

200

20

30

152

21

10

Each communication line comes with its own price tag. Company wants to spend the least amount of money, and have all its branches connected.

Solution given by a MST on the graph. (Why?)


Find “clusters” of nodes.

Example 2: Clustering

Edge values equal to distances of nodes

Possible solution: Find MST. Eliminate “fat” edges.


Possible solution: Find MST. Eliminate “fat” edges.


Note: this is a “heuristic” algorithm. Needs analysis.

Example 3: Dendritic Structures in the Brain

http://cvlab.epfl.ch/research/medical/neurons/

Problem: Reconstruct shape of neurons from noisy microscopy data via automatic tools



Example 3: Dendritic Structures in the Brain

Dendrite tracking in microscopic images using minimum spanning trees and localized EM -- by Fleuret and Fua



Example 4: Phylogenetic Trees

Genetic variability and population structure of endangered Panax ginseng in the Russian Primorye -- by Zhuravlev et al

Algorithms

Cuts

Cut: A cut (A,B) in a graph G=(V,E) is a partition of V into two nonempty sets A and B.Crossing edge: Any edge connecting a vertex in A to a vertex in B.

Crossing edges

A B

Cut Property

Cut C=(A,B), tree T on A which is part of MST, e crossing edge of minimum weight. Then there is MST M containing e and T.

MST

Crossing edge of minimum weightA

Cut C=(A,B), tree T on A which is part of MST, e crossing edge of minimum weight. Then there is MST M containing e and T. Proof:• Take MST containing T

• Add crossing edge e of min weight to MST• This creates a cycle

• Cycle has one other crossing edge f• Weight of f is at least equal to that of e

• Replace f by e in the MST• This gives new MST which contains e• This means that the weights of e and f have been equal

Cut Property

e

e

f

f

e

Prim’s Algorithm

http://www.ithistory.org/honor_roll/fame-detail.php?recordID=882

http://inserv.math.muni.cz/biografie/vojtech_jarnik.html

http://en.wikipedia.org/wiki/Edsger_W._Dijkstra

Voijtech Jarnik1897-1970

Robert Prim1921 -

Edsger Dijkstra1930-2002







Prim’s Algorithm

Start with any vertex v, set tree T to singleton v.Greedily grow tree T:

at each step add to T a minimum weight edge with exactly one endpoint in T.

etc

Tree nodes

Edges connected to exactly one tree node Tree edges

v

Prim’s Algorithm

Why does it work?T is always a subtree of a MSTInduction on number of nodes in T. Final T is MST by this result.

Start: trivial

Step: use cut property

In MST by hypothesis

v

Crossing edge of minimum weight

In MST by cut property

Singleton v is part of a MST

2

5

10

19

12

10

8

3

17

22

23

12

2

18

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

2

5

10

19

12

10

8

3

17

22

23

12

2

18

16

4

Prim’s Algorithm

Implementation Challenge

How do we find minimum crossing edge at every iteration?

Check all the outgoing edges: O(|E|) comparisons at every iteration O(|E| |V|) running time in total

More clever data structure:

• For every node w, keep value dist(w) that measures the “distance” of w from current tree

• At the start, dist(v) = 0, and dist(w) = infinity for all other w

• When a new node u is added to tree, check whether the neighbors of u decrease their distance to tree; if so, decrease distance.

Maintain a min-priority queue for the nodes and their distances.

Implementation

(1) dist(v) = 0, dist(w) = for w v, pred(v) = NULL

(2) Create min-priority queue Q for V with respect to dist

(3) While Q is not empty do

(a) u = deleteMin(Q)

(b) if ( u is not marked) then

(i) Mark u

(ii) For all neighbors w of u do

a. if ( dist(w) > weight of edge (u,w) and w not marked) then

i. dist(w) = weight of edge (u,w)

ii. pred(w) = u

iii. Sift up w in Q

(4) Output tree {(pred(v),v) | v in V}

∞ �=

Q contains all nodes that are not yet covered by the MST

u is node with smallest distance to the current tree

u is now covered

Update distance of neighbors of u

Predecessor of w in the tree

Multiple copies of w may be present in Q, but only one gets marked

(1) dist(v) = 0, dist(w) = for w v, pred(v) = NULL

(2) Create min-priority queue Q for V with respect to dist


(a) u = deleteMin(Q)

(b) if ( u is not marked) then

(i) Mark u

(ii) For all neighbors w of u do

a. if ( dist(w) > weight of edge (u,w) and w not marked) then

i. dist(w) = weight of edge (u,w)

ii. pred(w) = u

iii. Sift up w in Q

(4) Output tree {(pred(v),v) | v in V}

Analysis

∞ �=

O(|V|)

O(log(|Q|)) = O(log(|E|))

< |E| times (each vertex gets added to Q at most its degree times)

at most |E| times in total

O(log(|Q|)) = O(log(|E|))

O(|E| log(|E|)) for connected graphs.

Kruskal’s Algorithm

http://www.voteview.com/ideal_point_Non_Metric_MDS.htm

Joseph B. Kruskal1928-2010




Maintains forest which will become a MST at the end.

(1)Start from empty tree T

(2)Consider edges in ascending order of cost. Add next edge in list to T if it doesn’t create a cycle.

0

1

2

3

45

6

7

02

4

6 02

4 4

4

u v weight

3 5 0.18

1 7 0.21

6 7 0.25

0 2 0.29

0 7 0.31

0 1 0.32

3 4 0.34

4 5 0.40

4 7 0.46

0 6 0.51

4 6 0.51

0 5 0.6

3-5

1

7

3

51-7

1

3

5

1

3

5

1

3

5

1

5

1

5

6-7

6

7

0-2

0-7 3-4

02

6

02

02

4

6

7

3

7

02

6

4

6

7

3

7

4-7

Several components


Why does it work? T is always a sub-forest of a MSTInduction on number of nodes in T. Final T is MST by this result.

Start: trivial

Step: by hypothesis, current T is sub-forest of a MST.

Edge e is edge of minimum weight that doesn’t create cycles (among the black edges)

T is a union of singleton vertices

Blue edges are part of a MST, but are not added yet

Blue edges have already been added

e



Start: trivial


Weight of e is smaller than the weights of the blue edges




e



Start: trivial


Exchanging one of the blue edges with e creates another MST which contains T as a sub-forest, and also contains e, hence contains the new T.




e



Start: trivial


Cost of the new T is smaller than or equal to cost of the old MST, since weight of e is smaller than or equal to the weight of the blue edges.




e



Start: trivial


Hence, new spanning tree is part of a MST, and hence T and e belong to a MST.




e

1

12

3

3

6

5

1

1

2

2

7

4

3

2

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

2

5

2

3

2

3

1

3


1

12

3

3

6

5

1

1

2

2

7

4

3

2

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

2

5

2

3

2

3

1

3


1

12

3

3

6

5

1

1

2

2

7

4

3

2

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

2

5

2

3

2

3

1

3


1

12

3

3

6

5

1

1

2

2

7

4

3

2

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

2

5

2

3

2

3

1

3


1

12

3

3

6

5

1

1

2

2

7

4

3

2

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

2

5

2

3

2

3

1

3


1

12

3

3

6

5

1

1

2

2

7

4

3

2

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

2

5

2

3

2

3

1

3


1

12

3

3

6

5

1

1

2

7

4

3

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

5

2

3

2

3

1

3


1

12

3

3

5

1

1

2

7

4

3

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

3

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

5

2

3

2

3

1

3


1

12

3

5

1

1

2

7

4

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

5

2

3

2

3

1

3


1

12

3

5

1

1

2

7

4

1

1

1

10

4

4

5

1

12

14

4

6

3

2

1

7

7

4

3

5

8

10

5

4

9

8

19

12

8

7

19

3

4

1

2

5

2

3

2

3

1

3


1

12

3

5

1

1

2

7

4

1

1

1

10

4

5

1

12

146

3

2

1

7

7

3

5

8

10

5

9

8

19

12

8

7

19

3

1

2

5

2

3

2

3

1

3


1

12

3

1

1

2

7

4

1

1

1

10

4

1

12

146

3

2

1

7

7

3

8

10

5

9

8

19

12

8

7

19

3

1

2

2

3

2

3

1

3

All nodes covered now.Kruskal’s Algorithm

1

3

1

1

2

4

1

1

1

4

1

3

2

1

3

5

3

1

2

2

3

2

3

1

3

Minimum spanning treeKruskal’s Algorithm

1

3

1

1

2

4

11

1

4

1

3

2

1

3

5

3

1

2

2

32

3

1

3


Implementation Challenge

How do we check whether addition of a new edge creates a cycle?Note: if G=(V,E) is the original graph, and T is the set

of edges already created, then we are looking for a

cycle in the graph F = (V,T) (not in G).

e=(u,v) creates cycle iff connected component of u = connected component of v

We could check whether the connected components are equal by doing a DFS starting from u and checking whether we reach v.

Implementation: 1st Version

(1) Create min-priority queue Q for E with respect to weight

(2) For all v in V set pred(v)=NULL

(3) Set T to the empty set


(a) (u,v) = deleteMin(Q)

(b) Run DFS starting from u on T

(c) If v is not in the connected component of u then

(i) if pred(u)=NULL set pred(u)=v else pred(v)=u

(ii) Add edge (u,v) to T

(5) Output T

........................................................................ Initial forest doesn’t have edges

.......................................................................... Take smallest edge off the queue

................ Sort the edges

............ Continue until all edges processed (can stop when |T|=|V|-1)

.......................................... Check whether edge creates cycle

Implementation: 1st Version










(5) Output T

O(|E|)

O(log(|E|)

< |E| times

O(|T|) = O(|V|)

O(|E| |V|) total.

Better?

Just sort edges with respect to weight. No need for Q.










(5) Output T

Check whether |T|=|V|-1

Just take next edge off the listWhat should we do with this?

Still O(|E| |V|) total.

????????

Data Structure

Need a good data structure to check whether components are equal.

This is done via the Union-Find data structure.

Union-Find

Dynamic Graph Model

We have a graph G on n vertices 0,1,....,n-1.

Edges are revealed one-by-one.

Want to keep track of the connected components of the graph as edges are revealed.

Introduce data structure UF on the set of vertices which keeps track of the components.

Operations on UF:• Union(a,b): join the components of a and b.

• Connected(a,b): returns true iff a and b are in the same component.

Example

0 1 2 3

4 5 6 7

0 1 2 3

4 5 6 7

union(2,5)

Connected(0,6)=false

Connected(2,3)=true

Quick Find

Data Structure:• Integer array id[] of size n

• Interpretation: p and q are connected iff they have the same id

i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 9 9 6 6 7 8 9

5 and 6 are connected2,3,4, and 9 are connected

0 1 2 3

5 6 7 8

4

9

Quick Find



i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 9 9 6 6 7 8 9


Connected(p,q): true, iff p and q have same id id[3]=id[9]=6Connected(3,9)=true

Quick Find



i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 9 9 6 6 7 8 9


Connected(p,q): true, iff p and q have same id id[3]=id[9]=6Connected(3,9)=true

Union(p,q): to merge the components of p and q, change all entries whose id[] equals id[p] to id[q].

i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 6 6 6 6 6 7 8 6

Problem: many values can change

After union(2,5)

Example

id[p] and id[q] differ, so union()changes entries equal to id[p] to

id[q] (in red)

id[p] and id[q] match, sono change

Too Slow

Count number of array accesses

algorithm init union connected

Quick-find n n 1

Defect: unions are too expensive

Quick-Union


• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].

i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 9

p

0 1

2

3

5

6 7 8

4

9

Keep going until no change

q

3’s root is 9, 5’s root is 6

Quick-Union


• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].

i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 9

p

0 1

2

3

5

6 7 8

4

9

Keep going until no change

q

Connected(3,5)=falseConnected(p,q): Check whether p and q have same root

3’s root is 9, 5’s root is 6

Union(p,q): to merge the components of p and q, set the id of p’s root to the id of q’s root

p

0 1

2

3

5

6 7 8

4

9

q

union(3,5)

i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 6

Only one value changes

Example

Also Too Slow

Count number of array accesses


Quick-find n n 1

Quick-union n n n

Quick-find defect: unions are too expensive

Quick-union defects:• Trees can get tall• Connected() too expensive (could be n array accesses)

Worst case

Union-Find Data Structure


• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].• size[j] is size of connected component of j (if j is root)

i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 9

size[i] 1 1 x x x x 2 1 1 4

0 1

2

3

5

6 7 8

4

9

Size may not be accuratefor non-roots

Union-Find Data Structure


• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].• size[j] is size of connected component of j (if j is root)

Connected(p,q): Check whether p and q have same root

CompSize(p): to find the size of the component of p, find p’s root r and return size[r].

Union(p,q): to merge the components of p and q, set the id of p’s root to the id of q’s root if CompSize(p) smaller than CompSize(q). Otherwise, set the id of q’s root to the id of p’s root.

Quick-union

Union-find

Avoiding Tall Trees

Example

Quick-union

Union-find

Example

UF.init(n)

for i from 0 to n-1 do set id[i]=i, size[i]=1

UF.FindRoot(i)

while id[i] is not equal to i do set id[i] = ireturn i

UF.union(i,j)

Set p=UF.FindRoot(i), q = UF.FindRoot(j)if p is not equal to q then if size[p] < size[q] then Set id[p] = q and size[p] = size[p]+size[q] else Set id[q]=p and size[q] = size[p]+size[q]

UF.Connected(i,j)

if UF.FindRoot(i) = UF.FindRoot(j) then return trueelse return false

................................................................................................................................... Initializes the data structure

......................................................................................................................................... Finds the root of i

............................................................................................................................................... Connected()

............................................................................................................................................................................... union()

AnalysisUF.init(n)

for i from 0 to n-1 do set id[i]=i, size[i]=1

UF.FindRoot(i)

while id[i] is not equal to i do set id[i] = ireturn i

UF.union(i,j)

Set p=UF.FindRoot(i), q = UF.FindRoot(j)if p is not equal to q then if size[p] < size[q] then Set id[p] = q and size[p] = size[p]+size[q] else Set id[q]=p and size[q] = size[p]+size[q]

UF.Connected(i,j)

if UF.FindRoot(i) = UF.FindRoot(j) then return trueelse return false

O(n)

Order of height of the tree representing component of i

Same complexity as FindRoot

Same complexity as FindRoot

Analysis

What is the height of the trees formed in terms of their size?

For Quick-union the trees can have height linear in their size. This is responsible for the bad performance of root-finding.

Theorem: for the union-find data structure the height of the tree representing a component is at most log2(s) where s is the number of nodes in the tree.


Quick-find n n 1

Quick-union n n n

Union-find n log(n) log(n) Worst case

Proof

Induction on size of the tree.Height changes only during the union operation. T union of trees T1 and T2

| T | = | T1 | + | T2 |Assume | T1 | | T2 |, h1 = height(T1), h2 = height(T2), h = height(T)Then h = max{h1+1,h2}

Case 1: h1 < h2 .So h = h2 log2(| T2 |) < log2(| T |)

Case 2: h2 h1.So h = h1+1 log2(| T1 |)+log2(2) = log2(2 | T1 |) log2( | T1 | + | T2 | ) = log2(| T |)

≤

≤

≤≤

≤

T1

T2 h=h2h1

T2

T1h1h=1+h1

Induction hypothesis

Induction hypothesis

Improvement

Path Compression: Immediately after computing the root of p set the id of each examined node to point to the root.

0

1 2

3

UF.FintRoot(9)

4 5

6 7

8 9

10 11 12

p

0

1 23

4 5

6

78

9

10

11 12

Example

1 linked to 6 becauseof path compression

7 linked to 6 becauseof path compression

Does it Work?

Theorem:Starting from an empty data structure, any sequence of M union-find operations on N objects makes at most proportional to N+M log*(N) array accesses.

Without proof.

log*(n) = iterated logarithm of n n log*(n)(1,2] 1(2,4] 2(4,16] 3(16,216] 4

(216,265536] 5

Back to MST’s

Upgraded Version of Kruskal’s Algorithm

(1) Create sorted ascending list L of edges with respect to weight

(2) UF.Init(|V|)


(4) for i from 0 to |E|-1 do

(a) (u,v) = L[i]

(b) If not UF.Connected(u,v) then

(i) UF.union(u,v)


(5) Output T

These two steps use the UF.FindRoot routine twice. Can make it more efficient by letting UF.union return a boolean value which is true iff u and v are connected.

Upgraded Version

(1) Create sorted ascending list L of edges with respect to weight

(2) UF.Init(|V|)


(4) for i from 0 to |E|-1 do

(a) (u,v) = L[i]

(b) If not UF.Connected(u,v) then

(i) UF.union(u,v)


(5) Output T

O(|E| log(|E|)

O(|V|)

O(log(|V|)

O(log(|V|)

O(1)

O(|E| log(|E|) + |V| + |E|log(|E|) ) = O(|E| log|E|) total.

Documents

Minimum Spanning Treesalgo.epfl.ch › _media › en › courses › 2011-2012 › algorithm... · Definition: A spanning tree of G is a connected acyclic subgraph T of G with the