27
Pairwise alignment using shortest path algorithms We will discuss: Edit graph Dijkstra’s algorithm A * algorithm (GDUS) 1

Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Embed Size (px)

Citation preview

Page 1: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Pairwise alignment using

shortest path algorithms

We will discuss:

• Edit graph

• Dijkstra’s algorithm

• A∗ algorithm (GDUS)

1

Page 2: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

References

• Dijkstra algorithm in general:

– Cormen, Leiserson, Rivest: Introduction to algorithms, MIT Press, 1990.ISBN 0-262-03141-8

– . . .

• Dijkstra and A∗ algorithm for sequence alignment:

– Lecture by Knut Reinert in SS04, which in turn relies on:

– Reinert, Stoye, Will, An iterative method for faster sum-of-pairs multiple se-quence alignment, Bioinformatics, 2000, Vol 16, no 9, pages 808-814.

– Stoye, Divide-and-Conquer Multiple Sequence Alignment, TR 97-02 Univer-sitat Bielefeld, 1997.

2

Page 3: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph

3

Page 4: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph

In principle, the edit graph is just another way to look at a dynamic programmingmatrix:

• Each matrix entry corresponds to a vertex.

• Dependencies in the recursion correspond to (directed) edges.

In this representation, finding an optimal alignment w.r.t. distance corresponds tofinding a shortest path in a directed acyclic graph (DAG).

(Why is the edit digraph acyclic?– because otherwise DP wouldn’t work: infinite loop in the recursion)

4

Page 5: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph (2)

Here is an edit graph with source s, sink t, and an optimal source-to-sink path s→ t

in dashed and red:

2 3100

1

2

3

C

A

N

A N Ns

t

Don’t mind that the arrows are the other way round this time.

5

Page 6: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph (3)

We denote the vertices by their coordinates, e.g., the source is s = (0,0).

A path π from s = (0,0) to t = (m, n) corresponds to an alignment. We considerπ as a set of edges. Each edge corresponds to a column of the alignment. The costof an alignment is the sum of the cost of all edges:

c(π) :=∑e∈π

c(e) .

The cost of an edge is given by the DP recursion:

F (i, j) := min

F (i− 1, j − 1) + s(xi, yj)

F (i− 1, j) + d

F (i, j − 1) + d

s(xi, yj)

d

d

(i− 1, j − 1) (i, j − 1)

(i− 1, j) (i, j)

6

Page 7: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph (4)

Recall that usually there are many orders in which the entries of a DP matrix can becomputed. Edit graphs are just the right level of abstraction to state this fact:

Computing the shortest path in a DAG (directed acyclic graph) can be done bytraversing the DAG in any topological order.

Recapitulation.

A topological order of a DAG is a labeling t : V → N of its nodes such that t(u) ≤t(v) holds for all edges (u, v) ∈ E.

7

Page 8: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph (5)

A topological order for a digraph G = (V, E) can be found, e.g., by a post-orderdepth-first traversal starting from the sink in O(V + E) time.

Topological-Sort(G)Input: DAG GOutput: topologically ordered list of vertices Vcall DFS(G) to compute finishing times f(v) for each vertex vas each vertex is finished, insert it onto the front of a linked list Treturn T

a

b

c

d

e

f

g

h

i

j2

4

−2

3

2

11

1

2−3

1

6

8

Page 9: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph (6)

We can find shortest paths in DAGs by processing the nodes in topological order.

DAG-shortest-paths(G, w, s)Input: DAG G, weights w, source sOutput: distance labels d and predecessors πd(s) := 0; π(s) := nilfor each vertex v ∈ V \ {s} do

d(v) :=∞π(v) := nil;

for each vertex u taken in topologically sorted order dofor each edge (u, v) ∈ E do

if d(v) > d(u) + w(u, v) thend(v) := d(u) + w(u, v)π(v) := u

a

b

c

d

e

f

g

h

i

j2

4

−2

3

2

11

1

2−3

1

6

On the next slide we do this for the more realistic example (only initial graph shown,shortest path computation at the blackboard).

9

Page 10: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

H E A G A W G H E E

P

A

W

H

E

E

A

2 1 1 2 1 4 2 2 1 1

3 3 3 3 3−15

3 3 3 3

2 1 −5 0 −5 3 0 2 1 1

−100 2 2 2 3 3

−100 0

0 −6 1 3 1 3 3 0 −6 −6

2 1 −5 0 −5 3 0 2 1 1

0 −6 1 3 33 0 −6 −61

8

8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8 8

8 8 8 8 8 8 8 8 8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

8

10

Page 11: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Edit graph (7)

However, we have already seen that already constructing the whole graph wouldbe the major obstacle. Let us have a look at an (in theory) more costly algorithm,namely Dijkstra’s algorithm for graphs with non-negative edge costs – which doesnot rely on a topological order.

11

Page 12: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Dijkstra’s algorithm

As noted before, constructing the complete graph is much too expensive. But wecan adapt Dijkstra’s algorithm and construct the graph as needed , starting from thesource s.

Recapitulation.

A priority queue Q is an abstract data type (ADT) that maintains a set of elementsS, each with an associated value called a key .

Here, the elements are the nodes of the edit graph and their keys are the distancelabels d.

12

Page 13: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Dijkstra’s algorithm (2)

Operations supported:

• Q.insert(x, k)inserts the element x with key k into Q

• Q.empty()true iff the queue is empty

• Q.extract min()returns the element in Q with the lowest key

• Q.decrease key(x, k)decreases the key of element v to the value k

13

Page 14: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Dijkstra’s algorithm (3)

1 Dijkstra(G, w, s, d, π)2 {3 G = (V, E) : graph;4 Adj : V → P(V ) : set of successor nodes, this implements E;5 w : E → N : edge weights;6 d : V → N : distance from s as computed by algorithm;7 π : V → V : predecessor;8 Q〈V, N〉 : priority queue;

10 d[s]← 0; ∀v ∈ V \ {s} : d[v]←∞;11 ∀v ∈ V : π[v]← nil;12 Q.insert(s, d[s]);13 while ( ! Q.empty()) do14 u← Q.extract min();15 foreach (v ∈ Adj(u)) do16 if (d[v] ≡ ∞) Q.insert(v,∞); fi17 c = d[u] + w(u, v);18 if (c < d[v])19 d[v]← c;20 π[v]← u;21 Q.decrease key(v, d[v]);22 fi23 od24 od25 }

14

Page 15: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Dijkstra’s algorithm (4)

• As in the usual Dijkstra algorithm, in the priority queue Q we store the valuesof the best paths found so far. The “final” values are stored in d. The shortestpath arborescence∗ is stored in π, it is used for backtracing.

• In each step we delete the node u with the minimal value from Q.

• This node is then expanded , which means that all neighboring nodes v of u,that are not already in Q, are inserted into Q with the value d(u) + w(u, v).We know Adj(u) because the graph has a simple grid structure.

• In case v is already in Q, we relax the triangle inequality.

∗An arborescence is a directed forest in which every vertex has out-degree at most one

15

Page 16: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Dijkstra’s algorithm (5)

Dijkstra’s algorithm guarantees that the value of the node u that is removed fromthe priority queue Q equals the value of the shortest path from s to u.

Thus, when t is extracted, we can backtrace the shortest path and output the align-ment.

The next slide shows the example again (only initial graph).

16

Page 17: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

H E A G A W G H E E

P

A

W

H

E

E

A

17 16 16 17 16 19 17 17 16 16

18 18 18 18 180

18 18 18 18

17 16 10 15 10 18 15 17 16 16

515 17 17 17 18 18

515 15

15 9 16 18 16 18 18 15 9 9

17 16 9 15 10 18 15 17 16 16

15 9 16 18 1818 15 9 917

15.5

15.5

15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5

15.5

15.5

15.5

15.5

15.5

15.5

17

Page 18: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Dijkstra’s algorithm (6)

Theorem. (Correctness)Let G = (V, E) be a directed graph, s ∈ V and let w : E → N be a weight functionon its edges, w ≥ 0. Then the following holds: After v is extracted from the priorityqueue, d[v] equals distw(s, v), the length of a shortest path from s to v, and onesuch path is given by s, . . . , π(π(v)), π(v), v.

Proof. Blackboard.

18

Page 19: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Running time of Dijkstra’s algorithm

Dominated by the at most O(|E|) decrease key and the at most O(|V |) extract minoperations.

With binary heap :both take O(log |V |) time total time: O

((|V |+ |E|) log |V |

)With Fibonacci heap :decrease key in amortized∗ O(1) time, extract min in O(log |V |) timenada total time: O

(|E|+ |V | log |V |

)But: |E| = O(nm) → In theory, we have not improved the bounds. . . In practice,however, only a small portion of the alignment graph needs to be constructed, sothe running time is superior.

∗Amortized complexity of O(1) means that we may not achieve a O(1) bound for each individualcall, but a sequence of k calls will take O(k) time in total.

19

Page 20: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

A∗ algorithm, a.k.a. GDUS-Bounding

The so-called Goal-Directed-Unidirectional-Search (GDUS), also known as the A∗

algorithm tries to direct the computation of the shortest path more into the directionof the sink t.

This is achieved by using a lower bound l(u, t) on the length of a shortest path fromu to t. First the cost of an edge (u, v) is redefined as follows:

c′(u, v) := c(u, v) +(l(v, t)− l(u, t)

),

Dijkstra’s algorithm is only correct for non-negative edge weights, thus l needs tosatisfy the consistency-condition

c(u, v) ≥ l(u, t)− l(v, t) ∀(u, v) ∈ E .

Then it is easy to show (blackboard) that the redefinition of the edge costs does notchange the (set of) optimal path(s) and that the new edge weights are still positive.

20

Page 21: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

A∗ algorithm, a.k.a. GDUS-Bounding (2)

A simple lower bound is

l((i, j), (m, n)

):= |(m− n)− (i− j)|d ,

because the shortest path from (i, j) must end on the same diagonal as (m, n).

The better the lower bound, the more directed the search.

• In the extreme case, if l is tight, then we extract only the nodes on an optimalpath from the priority queue.

• In the other extreme, if l = 0, we get Dijkstra’s algorithm back.

Next slides: example (initial graph and solution).

21

Page 22: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

H E A G A W G H E E

P

A

W

H

E

E

A

17 16 16 17 16 19 17 17 16 16

18 18 18 18 180

18 18 18 18

17 16 10 15 10 18 15 17 16 16

515 17 17 17 18 18

515 15

15 9 16 18 16 18 18 15 9 9

17 16 9 15 10 18 15 17 16 16

15 9 16 18 1818 15 9 917

0

31

0 0 31 31 31 31 31 31 31

31

31

31

31

31

31

0 0 0 0

22

Page 23: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

H E A G A W G H E E

P

A

W

H

E

E

A

17 16 16 17 16 19 17 17 16 16

18 18 18 18 180

18 18 18 18

17 16 10 15 10 18 15 17 16 16

515 17 17 17 18 18

515 15

15 9 16 18 16 18 18 15 9 9

17 16 9 15 10 18 15 17 16 16

15 9 16 18 1818 15 9 917

0 0 0 0 31 62 93

31 17 16 16 16 47 78

62 48 31 26 26 26 47 78

79 79 62 47 44 44 26 47 78

93 78 64 63 47 44 52 83

95 80 78 65 52 61 92

96 81 61 77

92 7023

Page 24: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Example: DP Matrix for global alignment

match=0, mismatch=2, space=3

M Y M I S S I S A H I P P I E0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45

I 3 2 5 8 9 12 15 18 21 24 27 30 33 36 39 42M 6 3 4 5 8 11 14 17 20 23 26 29 32 35 38 41I 9 6 5 6 5 8 11 14 17 20 23 26 29 32 35 38S 12 9 8 7 8 5 8 11 14 17 20 23 26 29 32 35S 15 12 11 10 9 8 5 8 11 14 17 20 23 26 29 32M 18 15 14 11 12 11 8 7 10 13 16 19 22 25 28 31I 21 18 17 14 11 14 11 8 9 12 15 16 19 22 25 28S 24 21 20 17 14 11 14 11 8 11 14 17 18 21 24 27S 27 24 23 20 17 14 11 14 11 10 13 16 19 20 23 26I 30 27 26 23 20 17 14 11 14 13 12 13 16 19 20 23S 33 30 29 26 23 20 17 14 11 14 15 14 15 18 21 22S 36 33 32 29 26 23 20 17 14 13 16 17 16 17 20 23I 39 36 35 32 29 26 23 20 17 16 15 16 16 19 20 19P 45 42 41 38 35 32 29 26 23 22 21 20 17 16 19 22I 48 45 44 41 38 35 32 29 26 25 24 21 20 19 16 19

24

Page 25: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Example: Needleman-Wunsch (2)

M Y M I S S I S A H I P P I E

← ← ← ← ← ← ← ← ← ← ← ← ← ← ←

I ↑ ↖ ↖← ↖← ↖ ← ← ↖← ← ← ← ↖← ← ← ↖← ←

M ↑ ↖ ↖ ↖ ← ↖← ↖← ↖← ↖← ↖← ↖← ↖← ↖← ↖← ↖← ↖←

I ↑ ↑ ↖ ↖ ↖ ← ← ↖← ← ← ← ↖← ← ← ↖← ←

S ↑ ↑ ↑↖ ↖ ↑↖ ↖ ↖← ← ↖← ← ← ← ← ← ← ←

S ↑ ↑ ↑↖ ↑↖ ↖ ↑↖ ↖ ← ↖← ← ← ← ← ← ← ←

M ↑ ↑↖ ↑↖ ↖ ↑↖ ↑↖ ↑ ↖ ↖← ↖← ↖← ↖← ↖← ↖← ↖← ↖←

I ↑ ↑ ↑↖ ↑ ↖ ↑↖← ↑ ↖ ↖ ↖← ↖← ↖ ← ← ↖← ←

S ↑ ↑ ↑↖ ↑ ↑ ↖ ↑↖← ↑ ↖ ↖← ↖← ↖← ↖ ↖← ↖← ↖←

S ↑ ↑ ↑↖ ↑ ↑ ↑↖ ↖ ↑← ↑↖ ↖ ↖← ↖← ↖← ↖ ↖← ↖←

I ↑ ↑ ↑↖ ↑ ↑↖ ↑ ↑ ↖ ↑← ↑↖ ↖ ↖ ← ← ↖ ←

S ↑ ↑ ↑↖ ↑ ↑ ↑↖ ↑↖ ↑ ↖ ← ↑↖ ↖ ↖ ↖← ↖← ↖

S ↑ ↑ ↑↖ ↑ ↑ ↑↖ ↑↖ ↑ ↑↖ ↖ ↖← ↑↖ ↖ ↖ ↖← ↖←

I ↑ ↑ ↑↖ ↑ ↑↖ ↑ ↑ ↑↖ ↑ ↑↖ ↖ ↖ ↑↖← ↖ ↖ ←

P ↑ ↑ ↑↖ ↑ ↑ ↑ ↑ ↑ ↑ ↑↖ ↑↖ ↖ ↖ ↖← ↑↖ ↖

P ↑ ↑ ↑↖ ↑ ↑ ↑ ↑ ↑ ↑ ↑↖ ↑↖ ↑↖ ↖ ↖ ← ↑↖←

I ↑ ↑ ↑↖ ↑ ↑↖ ↑ ↑ ↑↖ ↑ ↑↖ ↑↖ ↖ ↑ ↑↖ ↖ ←16*17=272 nodes were used.

25

Page 26: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Example: Dijkstra (3)

M Y M I S S I S A H I P P I E

← ← ← ← ← ← ←

I ↑ ↖ ↖← ↖← ↖ ← ← ↖← ←

M ↑ ↖ ↖ ↖ ← ↖← ↖← ↖← ↖←

I ↑ ↑ ↖ ↖ ↖ ← ← ↖← ← ←

S ↑ ↑ ↑↖ ↖ ↑↖ ↖ ↖← ← ↖← ← ←

S ↑ ↑ ↑↖ ↑↖ ↖ ↑↖ ↖ ← ↖← ← ← ←

M ↑ ↑↖ ↑↖ ↖ ↑↖ ↑↖ ↑ ↖ ↖← ↖← ↖← ↖← ←

I ↑ ↑ ↑↖ ↑ ↖ ↑↖← ↑ ↖ ↖ ↖← ↖← ↖ ← ←

S ↑ ↑↖ ↑ ↑ ↖ ↑↖← ↑ ↖ ↖← ↖← ↖← ↖ ↖←

S ↑ ↑ ↑↖ ↖ ↑← ↑↖ ↖ ↖← ↖← ↖← ↖

I ↑ ↑ ↑ ↖ ↑← ↑↖ ↖ ↖ ← ←

S ↑ ↑↖ ↑ ↖ ← ↑↖ ↖ ↖ ↖← ←

S ↑ ↑ ↑↖ ↖ ↖← ↑↖ ↖ ↖ ↖←

I ↑ ↑ ↑↖ ↖ ↖ ↑↖← ↖ ↖ ←

P ↑ ↑↖ ↑↖ ↖ ↖ ← ↑↖ ↖

P ↑ ↑↖ ↑↖ ↖ ↖ ←

I ↑ ↑↖ ↖ ←165 nodes were inserted, 132 nodes were extracted.

26

Page 27: Pairwise alignment using shortest path · PDF filePairwise alignment using shortest path algorithms ... shortest path computation at the blackboard). 9. ... because the shortest path

Example: A∗ (GDUS) (4)

M Y M I S S I S A H I P P I E

← ← ← ←

I ↑ ↖ ↖← ↖← ↖ ←

M ↑ ↑ ↖ ↖ ← ↖←

I ↑ ↑ ↑↖ ↖ ↖ ← ←

S ↑ ↑ ↑↖ ↑↖ ↑↖ ↖ ↖← ←

S ↑ ↑↖ ↑↖ ↑↖ ↑↖ ↖ ← ←

M ↑ ↖ ↑↖ ↑ ↑ ↖ ↖← ←

I ↑ ↖ ↑ ↑ ↖ ↖ ↖← ←

S ↑ ↖ ↑↖ ↑ ↖ ↖← ↖←

S ↑ ↖ ↑ ↑↖ ↖ ↖← ←

I ↑ ↖ ↑ ↑↖ ↖ ↖ ←

S ↑ ↖ ↑↖← ↑↖ ↖ ↖ ←

S ↑ ↖ ↑↖← ↑↖ ↖ ↖

I ↑ ↖ ↖ ↑↖ ↖ ←

P ↑ ↖ ↖ ↖

P ↑ ↖ ↖← ←

I ↑ ↖ ←106 nodes were inserted, 76 nodes were extracted.

27