30
Comp. Genomics Recitation 13 Genome rearrangements Homework solutions

Comp. Genomics

  • Upload
    hasad

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Comp. Genomics. Recitation 13. Genome rearrangements Homework solutions. Exercise 1. Two haploid, single-chromosome genomes G 1 and G 2 were sequenced. G 1 is an ancestor of G 2 . G 1 is represented by the unsigned permutation 1,2,…,n. - PowerPoint PPT Presentation

Citation preview

Page 1: Comp. Genomics

Comp. Genomics

Recitation 13

Genome rearrangementsHomework solutions

Page 2: Comp. Genomics

Exercise 1• Two haploid, single-chromosome

genomes G1 and G2 were sequenced. G1 is an ancestor of G2. G1 is represented by the unsigned permutation 1,2,…,n.

• The region gi,…,gj is known as a “tough chromosomal region”. Reversal events never create breakpoints in this region.

Page 3: Comp. Genomics

Exercise 1

• Assume that G2 was generated from G1 by the minimal number of reversal events that is needed for obtaining G2

• Give an upper bound on the number of reversal events that occurred during G1 to G2 evolution.

Page 4: Comp. Genomics

Solution 1• We can apply the same reversals in

reverse order to obtain G1

• E.g., if a single reversal transformed G1=12345 into G2=14325, we can apply a reversal on the same indices and get G1

• So is we show a series of reverse-reversals of length k, k is an upper bound

Page 5: Comp. Genomics

Solution 1

• Genes 1,…,i-1 appear in G2 before position i or after position j. In the worst case, we need i-1 reversal operations to get these genes into their correct order.• Then we have in G2:

1,2,..,i-1,TOUGH_REGION,REST_OF_GENES

where the TOUGH_REGION is eitheri,i+1,…,j or j,j-1,…,i+1

Page 6: Comp. Genomics

Solution 1

• We can fix the REST_OF_GENES region inn-j-1 reversal operations, and in total we get i-1+1+n-j-1=n-(j-i)-1

Page 7: Comp. Genomics

Exercise 2• A break point is a location in the sequence

such that • Prove or refute: Out of n/2 reversals on the

unsigned permutation 1,2,…,n, there is at least one reversal that cancels a breakpoint at some index.

• A reversal operates on a subsequence.• Note that a reversal can both cancel a

breakpoint and create new ones

1|| 1 ii

Page 8: Comp. Genomics

Solution 2

• Can you refute it?

• The claim is false.

• Consider the permutation (1,2,3). (1,2,3)(1,3,2)(3,1,2)(1,3,2)…

No No Yes No Yes No NoYes

Page 9: Comp. Genomics

Exercise 3• Two reversals occur on the permutation

1,2,…,n. How many breakpoints can occur in the resulting permutation?

Page 10: Comp. Genomics

Solution 3

• One reversal:

1 2 3 4 5 6 71 7 6 5 4 3 2 one breakpoint

1 6 5 4 3 2 7 two breakpoints

Page 11: Comp. Genomics

Solution 3

• Two reversals:

1 2 3 4 5 6 71 6 5 4 3 2 71 2 3 4 5 6 7 zero breakpoints

Page 12: Comp. Genomics

Solution 3

• Two reversals:

1 2 3 4 5 6 71 7 6 5 4 3 2 3 4 5 6 7 1 2 one breakpoint

Page 13: Comp. Genomics

Solution 3

• Two reversals:

1 7 6 5 4 3 2 1 2 3 4 5 6 7

1 3 4 5 6 7 2 two breakpoints

Page 14: Comp. Genomics

Solution 3

• Two reversals:

1 2 3 4 5 6 71 6 5 4 3 2 71 6 2 3 4 5 7 three breakpoints

Page 15: Comp. Genomics

Solution 3

• Four breakpoints:

1 2 3 4 5 6 71 6 5 4 3 2 71 6 5 3 4 2 7 four breakpoints

Page 16: Comp. Genomics

DCJ Algorithm

• Why does it run in linear time?

Page 17: Comp. Genomics

DCJ Algorithm – cont’d

• dDCJ(A,B) = N – (C+I/2).• Each iteration increments either C by

on or I by two.• Our genome representation allows to

find and perform each sorting operation in constant time.

• The DCJ distance is never larger than N.

Page 18: Comp. Genomics

שאלה ממועד א' תשס"ז

גנום הוא קבוצה של כרומוזומים, שבו כל •כרומוזום הוא רצף של מספרים שלמים בעלי

סימן. יחד, הכרומוזומים מכילים את המספרים ללא חזרות.n,…,1השלמים

הוא גנום G={(1,-2,3),(4,5,6,-7)}למשל, •עם שני כרומוזומים אנחנו מניחים שכרומוזום וההפכי שלו עם סימנים הפוכים הם שקולים.

. (4,-5,-6,-7) שקול ל-(7,-4,5,6)לכן •

Page 19: Comp. Genomics

שאלה ממועד א' תשס"ז

( הופכת את הסדר ואת reversalפעולת היפוך )•הסימנים של מקטע רציף בתוך כרומוזום בודד.

Gלכן, היפוך יחיד על הכרומוזום הראשון של ({7,-4,5,6(, )3,2,-1})יכול לייצר את הגנום

( מחליפה translocationפעולת העברה )•מקטעים קיצוניים של שני כרומוזומים )כאשר

אחד מהם יכול להיות ריק(. למשל, העברה על G ({4,3(,)7,-2,5,6,-1}) יכולה ליצור את הגנום.

Page 20: Comp. Genomics

שאלה ממועד א' תשס"ז

הבעיה היא לעבור מגנום נתון לגנום אחר תוך •שימוש במספר מינימלי של פעולות היפוך

והעברה.תן אלג' המבטיח יחס ביצועים קבוע לבעיה •

ופועל בזמן פולינמויאלי.

Page 21: Comp. Genomics

פתרון

.signed reversalהבעיה שקולה ל-•-קירוב בזמן פולימניאלי.2ראינו בכיתה פתרון •

Page 22: Comp. Genomics

HW 3 question 5

• Uniform lifted alignment – alignment in which for each level all string are either lifted from right or left.

• Prove that the optimal uniform lifted alignment has cost at most twice of the optimal alignment tree.

• Give a polynomial algorithm to find the optimal uniform lifted alignment.

Page 23: Comp. Genomics

HW 3 question 5

• Uniform lifted alignment, proof:• Assume we had the optimal tree T*.• Transform it in the following way:• To assign string at level k, consider:

• Pick the minimal sum.

Page 24: Comp. Genomics

HW 3 – question 5 – cont’d

• Assign each ‘costy’ edge (T,S) to a path in the optimal tree:

• The path from leaf (T) to node (S*).

S (S*)

T S

T

Together, these paths cover all edges of the tree.

Page 25: Comp. Genomics

HW 3 – question 5 – cont’dBy triangle inequality:D(S, T) ≤ D(S, S*) + D(S*, T) S (S*)

T S

T

By choice of left/right:Σs D(S,S*)+D(S*,T) ≤ Σs D(S*,T)+D(S*,T) =Σs 2D(S*,T) => One-sided tree with cost at most twice the optimal.

Page 26: Comp. Genomics

HW 3 – question 5 – cont’d

• Algorithm:• Preprocess pairwise sequence

distances.• Try all different assignments for a

left/right for each level, and pick the minimal one.

• Running time (n sequences of length m):• Proprocessing: O(m2n2).• Height h, different assignment 2h.• Calculation cost of tree O(n).

Page 27: Comp. Genomics

HW 1 question 1

• Question 1: Explain how to compute local alignment in linear space

• The linear space algorithm from the lecture is a global alignment algorithm

Page 28: Comp. Genomics

solutionx

y

local alignment global alignment

Page 29: Comp. Genomics

solution

• For every cell [i,j] in the DP matrix, add a field b[i,j] that will be updated as follows:• If the score of [i,j] is 0 then b[i,j]=(i,j)• Otherwise

• If match b[i,j]=b[i-1,j-1]• If mismatch for x b[i,j]=b[i-1,j]• If mismatch for y b[i,j]=b[i,j-1]

Page 30: Comp. Genomics

solution

• Use the linear space algorithm from class for computing the score of the optimal local alignment

• At the same time the field b[i,j] can be updated for every cell

• Now, “cut out” the small matrix using the cell with the optimal score [i* ,j*] and b[i* ,j*], and run Hirschberg