33
Tarjan's Lowest Common Ancestor Algorithm Maria Mahbub Algorithms - COSC 581 04/20/2021

Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Tarjan's Lowest Common Ancestor Algorithm

Maria MahbubAlgorithms - COSC 581

04/20/2021

Page 2: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Test Questions1. What is the difference between static and off-line LCA

finding problems?

2. Which data structure is used in Tarjan’s lowest common ancestor algorithm?

3. What is the overall time complexity of Tarjan's lowest common ancestor algorithm?

2

Page 3: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

About me● Phd Student

○ Computer Science Major -- EECS, UTK○ Joined the Data Science Program @UTK in Fall 2018. ○ Migrated to Computer Science in Fall 2020

● Research interest: Natural Language Processing, more specifically vulnerability assessment of NLP models

● Research Collaborator -- Oak Ridge National Laboratory (NSSD)

○ Currently working on the REACHVET project, focusing on suicide prevention

● Advisors:○ Dr. Gregory Peterson (primary advisor at UTK)○ Dr. Edmon Begoli (co-advisor at ORNL)

3

Page 4: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Hometown & Interests● Home country: Bangladesh

● Home town: Dhaka, the capital of BD

● BS in Mathematics & MS in Appl. Mathematics

● Love to travel places with friends and family

● LOVE Biryani (my comfort food!)

8,354 miles

My undergrad institution: University of Dhaka (the oldest, largest and one of the most prestigious universities in

our country)

4

Page 5: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Some Pictures!

5

Page 6: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Outline● Overview● Motivation● History● Some Developments since Tarjan’s Algorithm● Algorithm● Implementation● Comparison with Another Frequently Used Algorithm: RMQ-LCA● Applications● References● Discussion

6

Page 7: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Overview● Lowest Common Ancestor (LCA)

○ Consider two nodes x and y in a tree

○ LCA(x,y) is the lowest node in the tree that has both x and y as descendants.

○ A node can be a descendant of itself.

○ LCA of z and y would be z, since y has a direct connection from z zx

y

7

Page 8: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Overview● Problem definition: If we have a rooted tree, how fast can we find

answers for Lowest Common Ancestor (LCA) queries for any pair of nodes?

● The LCA problem was first formulated by Aho, Hopcroft, and Ullman in 1973

● Applications:

○ In object-oriented programming to find superclass in an inheritance hierarchy

○ In compiler design to facilitate some common basic computation for two basic blocks through their ancestors

8

Page 9: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Motivation ● Aho, Hopcroft, and Ullman introduced 3 versions of the LCA problem:

● Online LCA: Find LCA for node pairs as the queries are made

● Static LCA: Require the answers on line, but all tree merging instructions precede the information requests

● Off-line LCA: First get all queries and then find LCA for node pairs

○ Time complexity: O(nlog*n) in the 1973 version

○ In 1976, Aho, Hopcroft, and Ullman used the set union algorithm. They used an intermediate problem, called the off-line min problem.

○ Time complexity becomes O(nα(n))

● The motivation behind Tarjan’s LCA algorithm was to implement a cleaner approach using set union algorithm directly

9

Page 10: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

History

● Developed in 1979 by Robert Endre Tarjan

● Published in “Applications of path compression on balanced trees” paper

● Uses disjoint-set/union-find data structure

● Time complexity: ○ Barely slower than linear

○ For a rooted tree with N nodes and Q queries, total runtime is O(Nα(N)+Q).

○ α is the Inverse Ackermann function

Robert Endre Tarjan

10

Page 11: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Chronological Developments

1984

● off-line● uses the observation that on

complete binary trees the LCA can be solved in O(1) time by direct calculation.

● O(n) to preprocess the tree● the theory was too

complicated to implement effectively

Gabow and Tarjan

1988

● off-line● uses EREW PRAM● q queries take O(logn) time to

process in (n+q)/logn processors

● Read conflicts are allowed● also not easily implementable

Schieber and Vishkin

1983

● off-line● for special case of disjoint set

union problem, time complexity is O(n)

● slight, but theoretically significant improvement

Gabow and Tarjan

11

Page 12: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Chronological Developments

2000

● static● simplification of the previous

approach● implemented without the

PRAM● sequential approach● uses RMQ● O(n) for pre-processing and

O(1) for query

Bender and Farach-Colton

1998

● off-line● pointer-machine

implementation● uses pointer-based radix sort● time complexity: O(n+q)

Buchsbaum, Kaplan, Rogers, and Westbrook

1989

● static● based on the observation by

Gabow, Bently and Tarjan that computing minimum over any interval can be reduced to answering an LCA query in a Cartesian data structure and the RMQ problem can be solved serially in linear time

● proposed an algorithm using CRCW PRAM that takes O(ɑ(n)) to preprocess and answer LCA queries

● Complexities arises due to PRAM

Berkman, Breslauer, Galil, Schieber, and Vishkin

12

Page 13: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Algorithm Basics● Disjoint-set/Union-find Data Structure

○ Keeps track of a set of elements partitioned into several disjoint subsets

○ Supports two basic operations:

■ Find: Determines which subset a particular element is in

■ Union: Merges two subsets into a single subset

○ Represented by rooted trees: Node: Member, Tree: Set

○ A member points only to its parent. Root contains the representative and is its own parent.

● Time complexity O(mα(n)), for a sequence of m union, or find operations on a

disjoint-set forest with n nodes, where α(n) is the extremely slow-growing inverse

Ackermann function.

13

e

f

a

cb

d

Page 14: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

● Disjoint-set Union implementation w/ Path Compression & Union Rank:○ MAKE-SET: creates a tree with just one node. Time Complexity: O(1)○ FIND-SET: follows parent pointers until the root of the tree is found.

■ path compression: points all the nodes on the search path directly to the root

○ UNION: causes the root of one tree to point to the root of the other.■ union by rank: attaches the shorter tree to the root of the taller tree.

f

g

a

cb

a

cb

d

f

g

union (e, g)

find(e) = find(g)

a f

d

14

Algorithm Basics

e

e

Page 15: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

AlgorithmLCA(u)

1. MAKE-SET(u)2. FIND-SET(u).ancestor = u3. for each child v of u in T4. LCA(v)5. UNION(u,v)6. FIND-SET(u).ancestor = u7. u.color = BLACK8. for each node v such that {u,v} ∈ P9. if v.color = BLACK

10. print “The least common ancestor of”11. u and v is FIND-SET(v).ancestor

15

Page 16: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Implementation● Find Answers to the Queries: LCA (2,5), LCA (6,7), LCA (5,6) in this

tree

1

2

4

3

5 6 7

16

Page 17: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Implementation1

2

4

3

5 6 7

1

2

4

3

5 6 7

LCA walk from 1 towards its left-child 2

LCA walk from 2 towards its left-child 4

17

➢ create disjoint set for node 1

➢ ancestor[1] = 1

➢ create disjoint set for node 2

➢ ancestor[2] = 21 2 4

Page 18: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Implementation1

2

4

3

5 6 7

1

2

4

3

5 6 7

Return back from 4 to 2 and color 4 BLACK

LCA walk from 2 towards its right-child 5

2

4

➢ return disjoint set for node 4

➢ UNION (2,4)

➢ ancestor[4] = 2

18

1 5

Page 19: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Implementation1

2

4

3

5 6 7

1

2

4

3

5 6 7

Return back from 5 to 2 and color 5 BLACK

Return back from 2 to 1 and color 2 BLACK

➢ LCA (2,5) = FIND-SET(5).ancestor = ancestor [FIND(5)] = ancestor[2] = 2 4

➢ return disjoint set for node 5

➢ UNION (2,5)

➢ ancestor[5] = 25

19

2 1

Page 20: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Implementation1

2

4

3

5 6 7

4

➢ return disjoint set for node 2

➢ UNION (1,2)

➢ ancestor[2] = 15

LCA walk from 1 towards its right-child 3

1

20

1

2

4

3

5 6 7

LCA walk from 3 towards its left-child 6

2 3 6

Page 21: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

21

Implementation1

2

4

3

5 6 7

2

4

➢ return disjoint set for node 6

➢ UNION (3,6)

➢ ancestor[6] = 35 1

1

2

4

3

5 6 7

LCA walk from 3 towards its right-child 7

Return back from 6 to 3 and color 6 BLACK

3

6

7

Page 22: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

2222

Implementation1

2

4

3

5 6 7

1

2

4

3

5 6 7

Return back from 7 to 3 and color 7 BLACK

➢ LCA (6,7) = FIND-SET(7).ancestor

= ancestor [FIND(7)]

= ancestor[3]

= 3

2

4 5 1

Return back from 3 to 1 and color 3 BLACK

➢ return disjoint set for node 7

➢ UNION (3,7)

➢ ancestor[7] = 3

3

6 7

Page 23: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

232323

Implementation1

2

4

3

5 6 7

➢ UNION (1,3)

➢ LCA (5,6) = FIND-SET(6).ancestor = ancestor [FIND(6)] = ancestor [3] = 1

2

4 5 1

3

6 7

❏ LCA (2,5) = 2

❏ LCA (6,7) = 3

❏ LCA (5,6) = 1

color 1 BLACK

Page 24: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Tarjan’s LCA vs. RMQ-LCA● Find Answers to the Queries: LCA (2,5), LCA (6,7), LCA (5,6) in T

1

2

4

3

5 6 7

24

Page 25: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

252525

RMQ-LCA Algorithm● Uses Range Minimum Query & Euler tour to find LCA on static tree

● For this Range Minimum Query Algorithm, in LCA(u,v), u must be smaller than v.

● Range Minimum Query: Used to find the position of an element with the minimum value between two specified indices in an array

● Euler Tour: way of traversing tree starting from root and then reaching back to root after visiting all vertices without lifting pencil.

● Time Complexity:

○ Preprocessing: O(n)

○ RMQ w/ segment tree data structure: O(logn)

25

1

2

4

3

5 6 7

Page 26: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

262626

RMQ-LCA Algorithm● Perform a Euler tour on the tree, and fill three arrays:

○ Euler Tour Array - tracks nodes visited in order during Euler tour○ Level Array - tracks each node’s respective level during Euler tour○ First Occurrence Array - tracks index of the first occurrence of nodes in

Euler tour

● Using the first occurrence array, get the indices corresponding to the two given nodes which will be the corners of the range in the level array that is fed to the RMQ algorithm for the minimum value.

○ Different approaches to solve RMQ: Naive, Square root decomposition, Sparse table, Segment tree based approach etc.

● Once the algorithm returns the index of the minimum level in the range, we use it to determine the LCA using Euler tour array.

26

Page 27: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

272727272727

RMQ-LCA Implementation● Requires 3 arrays for implementation:

○ Euler Tour Array:

○ Level Array:

○ First Occurrence Array:

● LCA (2,5):

○ Level of node 2 is 1 and level of node 5 is 2.

○ First occurrence of node 2 is 1 and node 5 is 4

○ The range of index in first occurrence array is 1 to 4

○ The elements in the range are: 2, 4, 5 with level 1, 2, 2

○ The minimum level in the range => LCA (2,5) = node 2

27

1

2

4

3

5 6 7

1 2 4 2 5 2 1 3 6 3 7 3 1

0 1 2 1 2 1 0 1 2 1 2 1 0

0 1 2 1 4 1 0 7 8 7 10 7 0

Found using RMQ algorithm

Page 28: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

282828

Performance Comparison● Compare runtime of Tarjan’s LCA and RMQ-LCA for a number of queries

● Tarjan’s LCA algorithm is better for off-line queries

● RMQ-LCA is better for online queries

● 7 Queries: LCA(1, 2), LCA(2, 3), LCA(2, 5), LCA(3, 6), LCA(4, 5), LCA(6, 7), LCA(5, 7)

Page 29: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Applications● In version control systems to implement three-way merge

algorithms

● In NLP to find common word roots or shortest semantic

dependency paths

● In computer graphics to find the smallest cube containing two

given cubes

29

Page 30: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Open Issues● Implementation with even faster node pre-processing for off-line

LCA problems

● Unavailability of more easily implementable algorithms for online

LCA problems

● Not many applications of LCA in Natural Language Processing

30

Page 31: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

References1. Aït-Kaci, H.; Boyer, R.; Lincoln, P.; Nasr, R. (1989), "Efficient implementation of lattice operations" (PDF), ACM

Transactions on Programming Languages and Systems, 11 (1): 115–146, CiteSeerX 10.1.1.106.4911, doi:10.1145/59287.59293.

2. Aho, Alfred; Hopcroft, John; Ullman, Jeffrey (1973), "On finding lowest common ancestors in trees", Proc. 5th ACM Symp. Theory of Computing (STOC), pp. 253–265, doi:10.1145/800125.804056.

3. Tarjan, R. E. (1979), "Applications of path compression on balanced trees", Journal of the ACM, 26 (4): 690–715, doi:10.1145/322154.322161.

4. Gabow, H. N.; Tarjan, R. E. (1983), "A linear-time algorithm for a special case of disjoint set union", Proceedings of the 15th ACM Symposium on Theory of Computing (STOC), pp. 246–251, doi:10.1145/800061.808753.

5. Harel, Dov; Tarjan, Robert E. (1984), "Fast algorithms for finding nearest common ancestors", SIAM Journal on Computing, 13 (2): 338–355, doi:10.1137/0213024.

6. H.N. Gabow, J.L. Bentley and R.E. Tarjan, "Scaling and related techniques for geometry problems", Proc. 16th ACM Symp. on Theory of Computing (1984), pp. 135-143.

7. O. Berkman, D. Breslauer, Z. Galil, B. Schieber, and U. Vishkin. Highly parallelizable problems. In Proc. of the 21st Ann. ACM Symp. on Theory of Computing, pages 309–319

8. B. Schieber and U. Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput., 17:1253–1262

9. M. A. Bender and M. Farach-Colton. The LCA problem revisted. In Proc. 4th LATIN, pages 88–94, 2000.10. A. L. Buchsbaum, H. Kaplan, A. Rogers, and J. R. Westbrook. Linear-time pointer-machine algorithms for LCA’s, MST

verification, and dominators. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC), pages 279–288, 1998.

11. Aho, A. V., Hopcroft, J. E., & Ullman, J. D. (1976). On finding lowest common ancestors in trees. SIAM Journal on computing, 5(1), 115-132.

12. Harel, D. (1980, October). A linear time algorithm for the lowest common ancestors problem. In 21st Annual Symposium on Foundations of Computer Science (sfcs 1980) (pp. 308-319). IEEE.

13. https://iq.opengenus.org/lca-in-binary-tree-using-euler-tour/

31

Page 32: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Discussion

32

Page 33: Tarjan's Lowest Common Ancestor Algorithm 04/20/2021

Test Questions1. What is the difference between static and off-line LCA

finding problems?

2. Which data structure is used in Tarjan’s lowest common ancestor algorithm?

3. What is the overall time complexity of Tarjan's lowest common ancestor algorithm?

33