19
Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem

Structure Alignment in Polynomial Time

Embed Size (px)

DESCRIPTION

Structure Alignment in Polynomial Time. Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem. Problem Statement. 2 structures in R 3 A={a 1 ,a 2 ,…,a n }, B={b 1 ,b 2 ,…,b m } - PowerPoint PPT Presentation

Citation preview

Page 1: Structure Alignment in Polynomial Time

Structure Alignment in Polynomial Time

Rachel KolodnyStanford University

Nati LinialThe Hebrew University of Jerusalem

Page 2: Structure Alignment in Polynomial Time

Problem Statement

• 2 structures in R3

A={a1,a2,…,an}, B={b1,b2,…,bm}

• Find subsequences sa and sb

s.t the substructures{asa(1),asa(2),…, asa(l)},{bsb(1),bsb(2),…, bsb(l)}

are similar

Page 3: Structure Alignment in Polynomial Time

Motivation

• Structure is better conserved than amino acid sequence – Structure similarity can give hints to

common functionality/origin

• Allows automatic classification of protein structure

Page 4: Structure Alignment in Polynomial Time

Correspondence Position

• Given a correspondence the rotation and translation that minimize the cRMS distance can be calculated

Kabsch, W. (1978).

Page 5: Structure Alignment in Polynomial Time

Position Correspondence

• Given a rotation and translation one can calculate the alignment that optimizes a (separable) score – Using dynamic programming– Essentially similar to sequence

alignment

• Example score 2

20# 10

1 ( , ) / 5i correspondance i i

gapsd A B

Page 6: Structure Alignment in Polynomial Time

Score cRMS

• We want to give “bonus points” for longer correspondences– e.g. corresponding ONE atom from each

structure has 0 cRMS

• Even better scores ?– vary gap penalty depending on position in

structure– Incorporate sequence information

Page 7: Structure Alignment in Polynomial Time

Score cRMSA specific correspondence

Page 8: Structure Alignment in Polynomial Time

Previous Work

Distance Matrices Heuristics in rotation and translation space

DALI [Holm and Sander 93]

CONGENEAL [Yee & Dill 93]

SSAP [Taylor & Orengo 89]

Nussinov-Wolfson [89,93]

Godzik [93]

STRUCTAL [Subibiah et al 93]

COMPARER [Sali & Blundell 90]

LOCK [Singh & Brutlag 97]

CE [Shindyalov & Bourne 98]

Taylor (??) [93]Zu-Kang & Sipppl 96 (?)

…*most data taken from Orengo 94

Page 9: Structure Alignment in Polynomial Time

“…It can be proved that, for these reasons, finding an optimal structural alignment between two protein structures is an NP hard problem and thus there are no fast structural alignment algorithms that are guaranteed to be optimal within any given similarity measure…”

Adam Godzik‘The structural alignment between two proteins: Is there a

unique answer’ 1996

“There is no exact solution to the protein structure alignment problem, only the best solution for the heuristics used in the calculation.”

Shindyalov & Bourne‘Protein Structure Alignment by Incremental Combinatorial (CE) of the Optimal Path’ 1998

Page 10: Structure Alignment in Polynomial Time

Exponentially many

Focus on Scoring Functions

Page 11: Structure Alignment in Polynomial Time

Exponentially many

Focus on Scoring Functions

Page 12: Structure Alignment in Polynomial Time

Exponentially many

All Maxima are interesting

Noisy data !!

Page 13: Structure Alignment in Polynomial Time

Good scoring functions

• Each of the functions is well-behaved– Satisfies Lipschitz condition

• Thus, the maximum over a finite set is well-behaved

• In each dimension two points at distance have function values that vary by O(n)

• Need O(n) samples in every dimension

Page 14: Structure Alignment in Polynomial Time

Sampling is Sufficient

Page 15: Structure Alignment in Polynomial Time

Polynomial Algorithm

• Sample in rotation and translation space– compute best score (and alignment)

for each sample point

• Return maximum score

• Need O(n6n2) time and O(n2) space

Page 16: Structure Alignment in Polynomial Time

Internal Distance Matrices

• Invariant to position and rotation of structures can be compared directly

• Find largest common sub-matrices (LCM) whose distances are roughly the same

Page 17: Structure Alignment in Polynomial Time

LCM is NP-complete

• Harder than MAX-CLIQUE

• Matrices encode distances that are positive, symmetric and obey triangle inequality

0 1 1 1 1 1

1 0 1 1 1 1

1 1 0 1 1 1

1 1 1 0 1 1

1 1 1 1 0 1

1 1 1 1 1 0

0 1 2 3 2 3 3 4 5 2

1 0 1 2 1 1 2 3 4 1

2 1 0 3 2 2 3 4 5 2

3 2 3 0 1 2 3 4 5 2

2 1 2 1 0 1 2 3 4 1

3 1 2 2 1 0 1 2 3 1

3 2 3 3 2 1 0 1 2 2

4 3 4 4 3 2 1 0 1 3

5 4 5 5 4 3 2 1 0 4

2 1 2 2 1 1 2 3 4 0

Page 18: Structure Alignment in Polynomial Time

Example

1dme28 amino acids

1jjd51 amino acids

Best STRUCTAL score 149Best score found by exhaustive search 197

Page 19: Structure Alignment in Polynomial Time

Heuristic

• Consider only translations that positions an atom from protein A on an atom of protein B

• O(m*n) instead of O((n+m)3)