Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling

Tertiary Structure Prediction MethodsAny given protein sequence

Structure selection

Compare sequence with proteins have solved structure

HomologyModeling

> 35%

Fold Recognition

ab initioFolding

< 35%< 35%

Structure refinement

Final Structure

Structure selection

Why Homology modelling ?

X-ray Diffraction – Only a small number of proteins can be made to form crystals.– A crystal is not the protein’s native environment.– Very time consuming.

NMR Distance Measurement –– Not all proteins are found in solution.– This method generally looks at isolated proteins rather than protein complexes.– Very time consuming

Homology Modeling:Principles, tools and

techniques• Development of molecular biology: rapid

identification, isolation and sequencing of genes.• Problem : time-consuming task to obtain the 3D-

structure of proteins.• Alternative strategy in structural biology is to

develop models of protein when the constraints from X-ray diffraction or NMR are not yet available.

• Homology modeling is the method that can be applied to generate reasonable models of protein structure.

Database approach to homology modelling

As of June 2000, 12,500 protein structures have been deposited into the Protein Data Bank (PDB) and 86,500 protein sequence entries were contained in SwissProt protein sequence database.• This is a 1:7 ratio – relatively few structures are known.• The number of sequence will increase much faster than the number of structures due to advances in sequencing.

Sequence similarity methods

• These methods can be very accurate if there is > 50%sequence similarity.

• They are rarely accurate if the sequence similarity < 30%.

• They use similar methods as used for sequence alignmentsuch as the dynamic programming algorithm, hiddenmarkov models, and clustering algorithms.

What is Homology Modeling?

• Predicts the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)

• If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed.

• In general, 30% sequence identity is required for generating useful models.

Structural Prediction by Homology Modeling

Structural Databases

Reference Proteins

Conserved Regions Protein Sequence

Predicted Conserved Regions

Initial Model

Structure Analysis

Refined Model

SeqFold,Profiles-3D, PSI-BLAST, BLAST & FASTA, Fold-recognition methods (FUGUE)

Cα Matrix Matching

Sequence Alignment

Coordinate Assignment

Loop Searching/generation

WHAT IF, PROCHECK, PROSAII,..Sidechain Rotamersand/or MM/MD

MODELER

How good can homology modeling be?

Sequence Identity

60-100% Comparable to medium resolution NMRSubstrate Specificity

30-60% Molecular replacement in crystallographySupport site-directed mutagenesis through visualization

<30% Serious errors

Significance of Protein Structure

What does a structure offer in the way of biological knowledge?

Location of mutants and conserved residues Ligand and functional sites Clefts/Cavities Evolutionary Relationships Mechanisms

The importance of the sequence alignment

• the quality of the sequence alignment is of crucial importance

• Misplaced gaps, representing insertions or deletions, will cause residues to be misplaced in space

• Careful inspection and adjustment on Automatic alignment may improve the quality of the modeling.

Programs for Model Protein Construction

• MODELLER 4.0– guitar.rockefeller.edu/modeller/modeller.html

• SWISS-MOD Server– www.expasy.ch/swissmod/SWISS-MODEL.html

• SCWRL (SideChain placement With Rotamer Library)– www.fccc.edu/research/labs/dunbrack/scwrl/

Protein Structural Databases

• Templates can be found using the TARGET sequence as a query for searching using FASTA or BLAST

– PDB (http://www.rcsb.org/pdb)– MODELLER

(http://guitar.rockefeller.edu/modeller/modeller.html)– ModBase (http://pipe.rockefeller.edu/modbase/general-

info.html)

– 3DCrunch (http://www.expasy.ch/swissmod/SM_3DCrunch.html)

http://www.rcsb.org/pdb/





Gaining confidence in template searching

• Once a suitable template is found, it is a good idea to do a literature search (PubMed) on the relevant fold to determine what biological role(s) it plays.

• Does this match the biological/biochemical function that you expect?

Other factors to consider in selecting templates

• Template environment

– pH

– Ligands present?

• Resolution of the templates

• Family of proteins

– Phylogenetic tree construction can help find the subfamily closest to the target sequence

• Multiple templates?

Target-Template Alignment

• No current comparative modeling method can recover from an incorrect alignment

• Use multiple sequence alignments as initial guide.

• Consider slightly alternative alignments in areas of uncertainty, build multiple models

• Sequence-Structure alignment programs

– Tries to put gaps in variable regions/loops

• Note: sequence from database versus sequence from the actual PDB are not always identical

Target-Multiple Template Alignment

• Alignment is prepared by superimposing all template structures

• Add target sequence to this alignment• Compare with multiple sequence alignment and

adjust

Adjusting the alignment

• Using tools such as Joy (www-cryst.bioc.cam.ac.uk/~joy/) to view secondary structure along the alignment and use this information as criteria for adjustments

• Avoid gaps in secondary structure elements

0 * 240 * 260 * 280 * 1ad3 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELLK--ERFDHIMYTGSTAVGKIVMAAAAK- : 2001cw3 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 2541ad3_4 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELL--KERFDHIMYTGSTAVGKIV-MAAAAK : 2001cw3_4 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 2541ad3_5 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELLKER--FDHIMYTGSTAVGKIV-MAAAAK : 2001cw3_5 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 2541ad3_6 : LKPSEVSGHMADLLATLIPQY-M---DQNLYLVVKGGVPETTELLKER--FDHIMYTGSTAVGKIV-MAAAAK : 2001cw3_6 : MKVAEQTPLTALYVANLIKEAGF---PPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 2541ad3_ce : LKPSEVSGHMADLLATLIPQYM----DQNLYLVVKGGV-PETTELLKE-RFDHIMYTGSTAVGKIVMAAAA-K : 2001cw3_ce : MKVAEQT---PLTALYVANLIKEAGFPPGVVNIVPGFGPTAGAAIASHEDVDKVAFTGSTEIGRVIQVAAGSS : 254 6K E 3 a a 6i 6 6V G p 6 D 6 5TGST 6G466 AA

Secondary Structure Prediction

The Predict Protein server http://www.embl-heidelberg.de/predictprotein/ Adding secondary structure prediction algorithms can

help make decisions on whether helices should be shortened/extended in areas of poor sequence identity.

PHD program

http://www.embl-heidelberg.de/predictprotein/





Constructing Multi-domain protein models

• Building a multi-domain protein using templates corresponding to the individual domains

• proteinAaaaaaaaaaaaaa---------------------

• proteinB -----------------bbbbbbbbbbbbbbb

• Targetaaaaaaaaaaaaabbbbbbbbbbbbbbb

Multiple model approach

Reminder: Consider the effects of different substitution matrices, different gap penalties, and different algorithms. (Vogt et al. J. Mol. Biol. 1995, 249:816-831.)

Construct multiple models Use structural analysis programs to determine best

model

Jaroszewski, Pawlowski and Godsik, J. Molecular Modeling, 1998, 4:294-309Venclovas, Ginalski and Fidelis. PROTEINS, 1999, 3:73-80 (Suppl)

Model Building

• Rigid-Body Assembly

– Assembles a model from a small number of rigid bodies obtained from aligned protein structure

– Implemented in COMPOSER

• Segment Matching

• Satisfaction of Spatial Restraints

– MODELLER

– guitar.rockefeller.edu/modeller/modeller.html

Modeller

• Main input are restraints on the spatial structure of AA and ligands to be modeled.

• Output is a 3D structure that satisfies these restraints

• Restraints are obtained from related protein structures (homology modeling) - obtained automatically, NMR structures, secondary struture packing and other experimental data

What are the Restraints ?

distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo atoms.

Sidechain Conformation

• Protein sidechains play a key role in molecular recognition and packing of hydrophobic cores of globular proteins

• Protein sidechain conformations tend to exist in a limited number of canonical shapes, usually called rotamers

• Rotamer libraries can be constructed where only 3-50 conformations are taken into account for each side chain

Sidechains on surface of protein

• Exposed sidechains on surface can be highly flexible without a single dominant conformation

• So ultimately if these solvent exposed sidechains do not form binding interactions with other molecules or involved in say, a catalytic reaction, then accuracy may not be crucial—also look at the B-factors

• Can refine the sidechains with molecular mechanics minimization

– Sampling?

– Scoring?

Errors in Homology Modeling

a) Side chain packing b) Distortions and shifts c) no template

Errors in Homology Modeling

d) Misalignments e) incorrect template

Marti-Renom et al., Ann. Rev. Biophys. Biomol. Struct., 2000, 29:291-325.

Detection of Errors

• First check should include a stereochemical check on the modeled structure—PROCHECK, WHATCHECK, DISTAN– which will show deviations from normal bond lengths, dihedrals, etc.

• Visualization– follow the backbone trace and then subsequently move out to Cα-Cβ orientation.

PROCHECK

http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

http://www.biochem.ucl.ac.uk/~roman/procheck/






Documents

Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling