46
Anna Yershova Department of Computer Science Duke University February 5, 2010 Automated High-Resolution Protein Structure Determination using Residual Dipolar Couplings Feb 5 2010, NC State University Feb 5 2010, NC State University Automated Protein Structure Determination using Automated Protein Structure Determination using RDCs RDCs 1

Anna Yershova Department of Computer Science Duke University February 5, 2010

Embed Size (px)

DESCRIPTION

Automated High-Resolution Protein Structure Determination using Residual Dipolar Couplings. Anna Yershova Department of Computer Science Duke University February 5, 2010. Feb 5 2010, NC State University. Automated Protein Structure Determination using RDCs. Introduction. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Anna Yershova Department of Computer Science Duke University February 5, 2010

Anna YershovaDepartment of Computer Science

Duke University

February 5, 2010

Automated High-Resolution Protein Structure Determination using

Residual Dipolar Couplings

Feb 5 2010, NC State UniversityFeb 5 2010, NC State University Automated Protein Structure Determination using Automated Protein Structure Determination using RDCsRDCs

1

Page 2: Anna Yershova Department of Computer Science Duke University February 5, 2010

High-resolution structures are needed for:

Determining protein functions Protein redesign

IntroductionIntroduction Motivation

Protein Structure Determination is Protein Structure Determination is ImportantImportant

Protein Structure Determination is Protein Structure Determination is ImportantImportant

2

Amino acid sequences

Structures

Functions Protein redesign

Page 3: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

What is Protein Structure: Primary What is Protein Structure: Primary StructureStructure

What is Protein Structure: Primary What is Protein Structure: Primary StructureStructure

3

1 2 3 4

The sequence of amino acids forms the backbone.Residues are sidechains attached to the backbone.

Amino acidSide chain Dihedral angle

Page 4: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

What is Protein Structure: Secondary What is Protein Structure: Secondary Structure ElementsStructure Elements

What is Protein Structure: Secondary What is Protein Structure: Secondary Structure ElementsStructure Elements

4

Local folding is maintained by short distance interactions.

Page 5: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

What is Protein Structure: 3D FoldWhat is Protein Structure: 3D FoldWhat is Protein Structure: 3D FoldWhat is Protein Structure: 3D Fold

5

Global 3D folding is maintained by more distant interactions.

Side chain

Beta-strands

Alpha-helix

Loop

Page 6: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

High-Throughput Structure High-Throughput Structure DeterminationDeterminationIs ImportantIs Important

High-Throughput Structure High-Throughput Structure DeterminationDeterminationIs ImportantIs Important

6 http://www.metabolomics.ca/News/lectures/CPI2008-short.pdf

The gap between sequences and structures

Page 7: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

Current Approaches for Structure Current Approaches for Structure DeterminationDetermination

Current Approaches for Structure Current Approaches for Structure DeterminationDetermination

7

X-ray crystallography Difficulty: growing good quality crystals

Nuclear Magnetic Resonance (NMR) spectroscopy

Difficulty: lengthy (expensive) time in processing and analyzing experimental data

Both require expressing and purifying proteins.

Page 8: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

Bruce Donald’s LabBruce Donald’s LabBruce Donald’s LabBruce Donald’s Lab

8

Bruce Donald

Cheng-Yu Chen

John MacMaster

Michael Zeng Chittu Tripathy Lincong Wang

Pei Zhou

Page 9: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

Types of NMRTypes of NMR Spectroscopy DataSpectroscopy DataTypes of NMRTypes of NMR Spectroscopy DataSpectroscopy Data

9

Chemical shift (CS) Unique resonance frequency, serves as an ID

Nuclear Overhauser effect (NOE) Local distance restraint between two protons

Residual dipolar coupling (RDC) Global orientational restraint for bond vectors

R

133.1

8.9

Ha

4.2

B0

172.1

NOE

Page 10: Anna Yershova Department of Computer Science Duke University February 5, 2010

Bailey-Kellogg et al., 2000, 2004http://www.pnas.org/content/102/52/18890/suppl/DC1

Assigning chemical shifts to each atom

IntroductionIntroduction Motivation

Resonance Assignment ProblemResonance Assignment ProblemResonance Assignment ProblemResonance Assignment Problem

10

Page 11: Anna Yershova Department of Computer Science Duke University February 5, 2010

Obtain local distance restraints between protons

Bailey-Kellogg et al., 2000, 2004

IntroductionIntroduction Motivation

NOE Assignment ProblemNOE Assignment ProblemNOE Assignment ProblemNOE Assignment Problem

11

A famous bottleneck

Page 12: Anna Yershova Department of Computer Science Duke University February 5, 2010

a1 a2 a3

a1

a2

a3

4

?

3

?

3

4

an

an

. . .

.

.

.

. . .

. . .

. . .

.

.

....

.

.

..

..

Distance Geometry

Assignment Ambiguity

NP-Hard

NOESY spectrum

NOE assignment

Resonance assignments

IntroductionIntroduction Motivation

Structure Determination from NOEsStructure Determination from NOEsStructure Determination from NOEsStructure Determination from NOEs

12

[Saxe ’79; Hendrickson ’92, ’95]

Page 13: Anna Yershova Department of Computer Science Duke University February 5, 2010

Protein Structure Determination is Hard

A famous bottleneck

IntroductionIntroduction Motivation

Traditional Structure Determination Traditional Structure Determination ProtocolProtocol

Traditional Structure Determination Traditional Structure Determination ProtocolProtocol

13

NOE Assignments

NOE Assignments 3D Structures

Resonance assignments NOESY spectra

RDCs

SA/MD

Initial fold

XPLOR-NIH

Structure Refinement

Page 14: Anna Yershova Department of Computer Science Duke University February 5, 2010

Protein Structure Determination is Hard

error propagation

local minima

manual intervention for initial fold and for evaluation of NOE assignments

A famous bottleneck

Can we have a poly-time algorithm using orientational restraints?

Yes: Wang and Donald, 2004; Wang et al, 2006

IntroductionIntroduction Motivation

Traditional Structure Determination Traditional Structure Determination ProtocolProtocol

Traditional Structure Determination Traditional Structure Determination ProtocolProtocol

NOE Assignments

NOE Assignments 3D Structures

Resonance assignments NOESY spectra

RDCs

SA/MD

Initial fold

XPLOR-NIH

Structure Refinement

14

Page 15: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

Types of NMRTypes of NMR Spectroscopy DataSpectroscopy DataTypes of NMRTypes of NMR Spectroscopy DataSpectroscopy Data

15

Chemical shift (CS) Unique resonance frequency, serves as an ID

Nuclear Overhauser effect (NOE) Local distance restraint between two protons

Residual dipolar coupling (RDC) Global orientational restraint for bond vectors

R

133.1

8.9

Ha

4.2

B0

172.1

NOE

Page 16: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single Bond

16

2

1cos3

4

2

3,

20

ba

ba

rD

B0 v

a

b

S – Saupe MatrixS is traceless and symmetricS contains 5 dofs

Alignment medium

Sxx Syy

Szz

vD

Page 17: Anna Yershova Department of Computer Science Duke University February 5, 2010

Protein Structure Determination is Hard

NOE Assignments

NOE Assignments 3D Structures

Resonance assignments NOESY spectra

RDCs

SA/MD

Initial fold

XPLOR-NIH

Structure Refinement

RDCsConstaint number of NOEs

Global Fold

RDC-ANALYTIC PACKER

Sidechain Placement

NOE Assignments

XPLOR-NIH

NOE Assignments 3D Structures

RDC-PANDA Protocol

IntroductionIntroduction Motivation

Traditional Structure Determination VS RDC-Traditional Structure Determination VS RDC-PandaPanda

Traditional Structure Determination VS RDC-Traditional Structure Determination VS RDC-PandaPanda

error propagation

local minima

manual intervention for initial fold and for evaluation of NOE assignments

17Zeng et al. (Jour. Biomolecular

NMR,2009)

Page 18: Anna Yershova Department of Computer Science Duke University February 5, 2010

Global orientational restraints from RDCs

Compute initial fold using exact solutions to RDC equations

Resolve NOE assignment ambiguity

Sparce data (high-

throughput, large proteins,

membraine proteins)

Automated side-chain resonance assignment

Avoid the NP-Hard problem of structure determination from

NOEs

IntroductionIntroduction Motivation

Importance of Backbone Structure Importance of Backbone Structure DeterminationDetermination

Importance of Backbone Structure Importance of Backbone Structure DeterminationDetermination

18

Page 19: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

Current Limitations of RDC-PandaCurrent Limitations of RDC-PandaCurrent Limitations of RDC-PandaCurrent Limitations of RDC-Panda

Because it requires only 2 RDCs per residue:

Only SSE elements can be reliably determined, NOEs are needed to determine structure of loops

Difficulty in handling missing data

19

Page 20: Anna Yershova Department of Computer Science Duke University February 5, 2010

IntroductionIntroduction Motivation

My Current ProjectMy Current ProjectMy Current ProjectMy Current Project

Improve current protein structure determination techniques from our lab

Design new algorithms for protein backbone structure determination using orientational restraints from RDCs

20

Page 21: Anna Yershova Department of Computer Science Duke University February 5, 2010

Distance geometry based structure determination Braun, 1987 Crippen and Havel, 1988 More and Wu, 1999

Heuristic based structure determination Brünger, 1992 Nilges et al., 1997 Güntert, 2003 Rieping et al., 2005

RDC-based structure determination Tolman et al., 1995 Tjandra and Bax, 1997 Hus et al., 2001 Tian et al., 2001 Prestegard et al., 2004 Wang and Donald (CSB 2004) Wang and Donald (Jour. Biomolecular

NMR, 2004) Wang, Mettu and Donald (JCB 2005) Donald and Martin (Progress in NMR

Spectroscopy, 2009 ) Ruan et al., 2008 Zeng et al. (Jour. Biomolecular

NMR,2009)

• Heuristic based automated NOE assignment– Mumenthaler et al., 1997

– Nilges et al., 1997, 2003

– Herrmann et al., 2002

– Schwieters et al., 2003

– Kuszewski et al., 2004

– Huang et al., 2006

• Automated NOE assignment starting with initial fold computed from RDCs

– Wang and Donald (CSB 2005)– Zeng et al. (CSB 2008)– Zeng et al. (Jour. Biomolecular

NMR,2009)

• Automated side-chain resonance assignment

– Li and Sanctuary, 1996, 1997– Marin et al., 2004– Masse et al., 2006– Zeng et al. (In submission, 2009)

IntroductionIntroduction Motivation

Literature OverviewLiterature OverviewLiterature OverviewLiterature Overview

21

Page 22: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single Bond

22

Linear in S,

A fixed v defines a hyperplane

Quadratic in v,

A fixed S defines a hyperboloid

S

Sxx Syy

Szz

vD

Page 23: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single Bond

23

1 RDC equation defines a collection of hyperplanes, 7 variables

S

Linear in S,

A fixed v defines a hyperplane

Quadratic in v,

A fixed S defines a hyperboloid

Page 24: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion

24

1 2 3 4

Page 25: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion

25

Too few equations, too many variables!

[1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004.[2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009.

1 2 3 4

v1

u1

v2

Page 26: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

Forward Kinematics Reduces the Number of Forward Kinematics Reduces the Number of VariablesVariables

Forward Kinematics Reduces the Number of Forward Kinematics Reduces the Number of VariablesVariables

26

v1

u1

v2

Fix coordinate system.

Page 27: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion

27

v1

u1

v2

Page 28: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion

28

Recursive representation is possible!

Page 29: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDCs

One Equation Per Dihedral Angle is One Equation Per Dihedral Angle is Not Enough!Not Enough!

One Equation Per Dihedral Angle is One Equation Per Dihedral Angle is Not Enough!Not Enough!

29

Each equation is linear in S, and quartic in either tan() or tan()

To be able to solve this system there must be additional information:

Possible scenarios:1. Additional RDC measurement(s) for each dihedral angle.2. Additional alignment media.3. Additional NOE data.4. Modeling (Ramachandran regions, steric clashes, energy function)5. Sampling (for alignment tensors)

Page 30: Anna Yershova Department of Computer Science Duke University February 5, 2010

BackgroundBackground RDC-Panda

The RDC-PANDA Structure Determination The RDC-PANDA Structure Determination PackagePackage

The RDC-PANDA Structure Determination The RDC-PANDA Structure Determination PackagePackage

30

Current requirements• 2 RDCs per residue to obtain SSE structures• Sparse NOEs to pack the SSEs

Current bottlenecks• Missing data (even in long SSEs)• Long loops• Sampling for computing alignment tensor(s)• Sampling for the orientation of the first pp

[1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004.[2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009.

Page 31: Anna Yershova Department of Computer Science Duke University February 5, 2010

Ellipse equations for CH bond vectorEllipse equations for CH bond vector

Wang & Donald, 2004; Donald & Martin, 2009.

BackgroundBackground RDC-Panda

When Saupe Matrix is Known Solution When Saupe Matrix is Known Solution Can Be Found Exactly!Can Be Found Exactly!

When Saupe Matrix is Known Solution When Saupe Matrix is Known Solution Can Be Found Exactly!Can Be Found Exactly!

Page 32: Anna Yershova Department of Computer Science Duke University February 5, 2010

Solution Structure of FF Domain 2 of human transcription elongation factor CA150 (FF2) using RDC-PANDA

PDB ID: 2KIQ

In collaboration with Dr. Zhou’s Lab

BackgroundBackground RDC-Panda

Solution Structure Deposited Using RDC-Solution Structure Deposited Using RDC-PandaPanda

Solution Structure Deposited Using RDC-Solution Structure Deposited Using RDC-PandaPanda

32

Page 33: Anna Yershova Department of Computer Science Duke University February 5, 2010

Current ProjectCurrent Project

Problem Formulation: NH, CH RDCs in 2 Problem Formulation: NH, CH RDCs in 2 MediaMedia

Problem Formulation: NH, CH RDCs in 2 Problem Formulation: NH, CH RDCs in 2 MediaMedia

33

We require measurements for at least 9 consecutive bond vectors (4.5 residues) in 2 media. The goal is to handle more equations and errors.

Page 34: Anna Yershova Department of Computer Science Duke University February 5, 2010

Current ProjectCurrent Project

Relationship to MinimizationRelationship to MinimizationRelationship to MinimizationRelationship to Minimization

34

Page 35: Anna Yershova Department of Computer Science Duke University February 5, 2010

Current ProjectCurrent Project

Relationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVD

35

b

sA

Solving an over constrained system of linear equations is equivalent to finding a projection of the b vector on the A hyperplane. This is also equivalent to minimizing the least square function of the terms.

Page 36: Anna Yershova Department of Computer Science Duke University February 5, 2010

Current ProjectCurrent Project

Relationship to MinimizationRelationship to MinimizationRelationship to MinimizationRelationship to Minimization

36

Page 37: Anna Yershova Department of Computer Science Duke University February 5, 2010

Current ProjectCurrent Project

Relationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVD

37

b

sA(i i)

Solving such a system of non-linear equations is not trivial!

There are multiple local minima in the corresponding minimization problem.

Page 38: Anna Yershova Department of Computer Science Duke University February 5, 2010

AdvantagesAdvantagesAdvantagesAdvantages

38

Current ProjectCurrent Project

If the minimization problem is solved then

• Computation of packed SSEs and loops is possible without additional NOE data.

• Saupe matrices for each of the alignment medium can be computed without sampling.

• Robust handling of missing values

Page 39: Anna Yershova Department of Computer Science Duke University February 5, 2010

The Algorithm: Initialization Using The Algorithm: Initialization Using HelixHelix

The Algorithm: Initialization Using The Algorithm: Initialization Using HelixHelix

39

Current ProjectCurrent Project

Compute initial approximation for Si using SVD

Initialize (i,i) for a helix

Compute (i,i) using tree search and minimization

Update Si using SVD

Page 40: Anna Yershova Department of Computer Science Duke University February 5, 2010

The Algorithm: Protein PortionThe Algorithm: Protein PortionThe Algorithm: Protein PortionThe Algorithm: Protein Portion

40

Current ProjectCurrent Project

Initialize Si to computed approximations

Compute (i,i) using tree search and minimization

Update Si using SVD

Page 41: Anna Yershova Department of Computer Science Duke University February 5, 2010

The Algorithm: Computing DihedralsThe Algorithm: Computing DihedralsThe Algorithm: Computing DihedralsThe Algorithm: Computing Dihedrals

41

Current ProjectCurrent Project

ψn

n

x

x

x

x

1

ψ1

Minimize each of the

RMSD terms as a

univariate function.

Compute the

list of best

solutions.

Iteratively

minimize the

RMSD function

Page 42: Anna Yershova Department of Computer Science Duke University February 5, 2010

AdvantagesAdvantagesAdvantagesAdvantages

42

Current ProjectCurrent Project

• The algorithm is converging, since every step minimizes RMSD function

• If the data was “perfect” then the solution to the minimization problem would be the roots of the polynomials in the RMSD terms, and the algorithm would find ALL of them.

• The minima of the RMSD terms give a good collection of initial structures for finding local and global minima

• Robust handling of missing values

Page 43: Anna Yershova Department of Computer Science Duke University February 5, 2010

Preliminary Results: Ubiquitin HelixPreliminary Results: Ubiquitin HelixPreliminary Results: Ubiquitin HelixPreliminary Results: Ubiquitin Helix

43

Preliminary ResultsPreliminary Results

Protein RMSD (Hz) Alignment Tensor (Syy, Szz)

Ubq :25-31

CH : 0.32

NH: 0.24

(23.66, 16.48)

(53.25, 7.65)

Conformation of the portion [25-31] of the helix for human ubiquitin computed using NH and CH RDCs in two media (red) has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ) (green). The backbone RMSD is 0.58 Å.

-60

-40

-20

0

20

40

60

-60 -40 -20 0 20 40 60

back-computed RDCs

exp

erim

enta

l RD

Cs

NH RDCs CH RDCs

Page 44: Anna Yershova Department of Computer Science Duke University February 5, 2010

Preliminary Results: Ubiquitin StrandPreliminary Results: Ubiquitin StrandPreliminary Results: Ubiquitin StrandPreliminary Results: Ubiquitin Strand

44

Preliminary ResultsPreliminary Results

Protein RMSD (Hz)

Alignment Tensor (Syy, Szz)

Ubq: beta 2-7 CH :

NH:

(53.32, 4.83)

(48.03, 14.32)

-60

-40

-20

0

20

40

-60 -40 -20 0 20 40

back-computed RDCsex

peri

men

tal R

DC

s

NH RDCs CH RDCs

Conformation of the portion [2-7] of the beta-strand for human ubiquitin computed using NH and CH RDCs in two media has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ). The backbone RMSD is 1.151 Å.

Page 45: Anna Yershova Department of Computer Science Duke University February 5, 2010

ConclusionsConclusionsConclusionsConclusions

45

Thank you!

Thank you!

• Complete and exhaustive search over the space of all structures minimizing the RDC fit function seems feasible due to understanding the structure of the solution.

• Possible and exiting extensions to more/different data

FundingFunding: : NIHNIH

Page 46: Anna Yershova Department of Computer Science Duke University February 5, 2010

Comparison

Data requirements vs. Accuracy (Ubiquitin):

Accuracy:Sparse

46