Upload
ledan
View
214
Download
0
Embed Size (px)
Citation preview
Protein Structure prediction
Alejandro Giorgetti
Sequence, function and structurerelationships.
• Life is the ability to metabolize nutrients, respondto external stimuli, grow, reproduce and evolve.
• Chemically proteins are: linear polymers of aa• Proteins assume a 3D shape which is usually
responsible for function• The consequence of the tight link between
structure, function and evolutionary pressuredistinguish proteins from ordinary polymers.
Protein Structure• The sequence of aminoacids is called the primary structure.• Secondary structure refers to local folding• Tertiary structure is the arrangement of secondary elements
in 3D.• Quaternary structure describes the arrangement of a protein
subunits.• The peptide bond is planar and the dihedral angle it defines is
almost always 180°.
Protein Structure
• What is a dihedral angle?
– Is the angle between two planes. In practice, if youhave four conected atoms and you want measure the dihedral angle around the central bond, you orient the system in such a way that the two central atoms are superimposed and measure the resulting angle between the first and last atom.
A A
B C
D
B C
D
Protein Structure• The simplest arrangements of aa is the alpha-helix, a right
handed spiral conformation. The structure repeats itself every5.4 A along the helix axis. There are 3.6 aa per turn
• The beta sheet. The R groups of neighboring residues in strand point in opposite directions. Parallel or antiparallel beta sheets.
• Ramchandan plot: pairs of angles that do not cause the atomsof a dipeptide to collide.
• In experimental structures wecan observe aa in disallowedregions:
The reason of combinationsrarely observed is becausethey are energeticallydisfavoured, but notmathematically impossible.
The loss of energy can becompensated by otherinteractions within the protein.
• Loops: regions without repetitive structure that connectssecondary structure elements.
• Supersecondary elements (cometimes called motif): Arrangements of two or three consecutive secondarystructure that are present in many different proteinstructures, even with completely different sequences. Alpha-alpha unit, beta-beta unit, beta-alpha-beta unit.
• Four-helix bundle; beta-alpha-beta-alpha-beta: Rossmanfold; TIM barrel fold (several beta-alpha-beta units)
• Domain: portion of the polypeptide chain that folds into a compact semi-independent unit.
Domains• Class(C)
– derived from secondary structurecontent is assigned automatically
• Architecture(A)
– describes the gross orientation of secondary structures, independent of connectivity.
• Topology(T)
– clusters structures according totheir topological connections and numbers of secondary structures
• Homologous superfamily (H)
Ala: transient interactions
Thr, Ser: phosphorylationtarget: protein kinases attackphosphate group to the side-chain. Thr: Beta-branched more often found in beta-sheets.
Gly: unusual ramachandran, often found in turns
Cys: Very reactive, coordinate metals.
Il problema del folding• Secondo principio della termodinamica: ∆G = ∆H – T ∆S,
ci da la stabilità di conformazione.• Entalpia: elettrostatica, dispersione, van der Waals,
legami idrogeno.• Entropia: l’acqua forma ‘ordered cages’ attorno agli aa
idrofobici. Folding rompe quest’ordine.• La energia libera di una proteina nello stato di fold è di
solo pochi Kcal/mol (pari a qualche legame idrogeno) • Anfinsen: Tutta l’informazione 3D è contenuta nella
sequenza (sperimento con l’urea).• Levinthal paradox: aa hanno infinite possibilità di
conformazioni. • Folding pathway specifico per ogni proteina: Funnel
theory
Hydrophobic effect
Il problema del folding
Structuralalignment
Evolution of protein structure
• If a base-substitution event occurs in a protein-coding region– The fine balance between the gain and loss of
free energy of folding is compromised: no single energy minimun -> NOT FOLD
– The energy landscape of the protein change. Still there is a global minimun of energy -> same or similar function. Local perturbationswithout affecting the general shape or topology.
Evolution of sequence vsevolution of structure
10 %
30 %
50 %
70 %
90 %
Drug design?
Biochemistry?
Molecular Biology?
[ Chothia & Lesk (1986) ]
X-r
ay c
rist
allo
grap
hy: M
R
Evolutionary-based methods for Protein structure prediction: Homology modelling
Idea: Proteines evolving from a common ancestor maintained similar core 3D structures.
Proteine con struttura nota sono utilizzate come ‘Templati’per modellare una sequenza per la quale non ‘è informazione sulla struttura 3D.
Target – Templato devono essere correlate evolutivamente.
Prima volta: 1970 da Tom Blundell
Template(s) selection
Sequence Alignment
Structure Modeling
Comparative ModelingKnown
Structures (templates)
Target
sequence Structure Evaluation
>hTEIIMSSPQAPEDGQGCGDRGDPPGDLRSVLVTTVLNLEPLDEDLFRGRHYWVPAKRLFGGQIVGQALVAAAKSVSEDVHVHSLHCYFVRAGDPKLP
Final Structural Models
Template(s) selection
Sequence Alignment
Structure Modeling
Target
sequence
Protein Data Bank PDB
Structure Evaluation
http://www.pdb.org
Banca Dati dei templati
Separare in singole catene
Controllare la qualità delle strutture
Comparative ModelingKnown
Structures (templates)
Final Structural Models
Known Structures (templates)
Sequence Alignment
Structure Modeling
Structure Evaluation
Final Structural Models
Target
sequence
Similarità di sequenza / Fold recognition
Analisi della struttura (risoluzione, metodo sperimentale
Ci sono altri atomi e/o composti? Sono legati?
Comparative Modeling
Template(s) selection
Known Structures (templates)
Template(s) selection
Structure Modeling
Structure Evaluation
Final Structural Models
Target
sequence
Fondamentale per la modellizzazione per omologia.
Allineamento globaleUn piccolo errore
nell’allineamento può essere fatale per il modello.
Ricordatevi: gli allineamenti a coppie sussurrano, quelli multipli parlano ad alta voce.
Sappiamo qualcos’altro? Ci sono sperimenti?
Comparative Modeling
Sequence Alignment
Known Structures (templates)
Template(s) selection
Comparative Modeling
Target
sequence Structure Evaluation
Final Structural Models
Assemblaggio di frammenti (Template based fragment
Assembly - SwissMod).
Minimizzazione della deviazione dai vincoli spaziali (Satisfaction of Spatial Restraints: MODELLER )
Sequence Alignment
Structure Modeling
Known Structures (templates)
Template(s) selection
Sequence Alignment
Structure Modeling
Target
sequence
Errori nella selezione dei templati
Cicli iterativi di: allineamento, modellizzazione e valutazione.
Comparative Modeling
Structure Evaluation
Final Structural Models
Modelli Nascosti di Markov (HMM)
• Rappresentazione degli allineamenti multipli a traverso le probabilità di ‘transizione’.
• Es.: possiamo utilizzare un allineamento per calcolare, in ogni posizione, la probabilità che dopo di essa ci sia una inserzione, un delezione oppure un ‘match’.
• Rappresentazione di un allineamento in termini probabilistici, e può essere utilizzato per stimare se una sequenza appartiene ad una famiglia
Seq1: A C C – E
Seq2: E C E – A
Seq3: A C E A A
Seq4: C – E - E
0.450.18 0.180.360.360.45Match - match
0.090.090.360.090.180.09Match - del
0.090.090.090.090.090.09Ins - Del
0.090.360.090.180.090.09Del – match
Frequenze
0 + 13 + 10 + 11 + 10 + 10 + 1Del – match
0 + 10 + 10 + 10 + 10 + 10 + 1Ins – Del
0 + 1
1 + 1
4 - 5
3 + 1
1 + 1
3 - 4
0 + 10 + 11 + 10 + 1Match - del
4 +13 + 13 + 14 + 1Match - match
5 - end2 – 31 – 2Inizio - 1Quantità
A 0.43
C 0.29
E 0.29
A 0.17
C 0.67
E 0.17
A 0.14
C 0.28
E 0.57
A 0.43
C 0.29
E 0.29
MatchInizio
delete
Ins
0.45
0.09
0.18
0.36
0.09
0.45
0.36
0.18
0.18
0.36
States=7 (+ match-ins,Ins-match, ins-ins)We have to add a countFor each state so the total counts are 11)
• Modello di hTEII (O14734 ) utilizzando: Sp3: http://theory.med.buffalo.edu/
• Ffas03: http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl?ses= • HHpred- Toolkit:
http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred
• mgenThreader: http://bioinf.cs.ucl.ac.uk/psipred/
• Domande: Analizzando gli allineamenti: ci sono delle differenze importanti?Cosa possiamo dire dei templati?Analisi strutturale: Possiamo a priori dire qual è il modello migliore? Qual è la regione più affidabile?
I. Template based fragment assembly (SwissModel)
[ http://www.expasy.org/spdbv/ ]
I. Template based fragment assemblya) Costruire il core conservato (Structurally conserved regions -SCRs)
[ http://www.expasy.org/spdbv/ ]
In corrispondenza alle regioni più rigide. Alta conservazione della sequenza e meno gaps.In generale: elementi di struttura secondaria.
I. Template based fragment assemblyb) Modellizzazione dei loop (Structural variable regions - SVRs)e regioni mancanti del backbone
Regioni più flessibili.
Alta probabilità di trovare gaps
In corrispondenza con loops e turns
Banche dati dei loops
Ricostruzione “ab-initio” dei loops (Monte Carlo,
dinamica molecolare, algoritmi genetici, ecc.)
[ http://www.expasy.org/spdbv/
I. Template based fragment assembly
c) Modelizzazione delle catene laterali
Trovare la conformazione più probabile per le catene laterali utilizzando:
strutture omologhe.
Librerie per i rotameri
Algoritmi per la minimizzazione energetica.
[ http://www.expasy.org/spdbv/ ]
I. Template based fragment assembly
d) Minimizzazione della energia
∑∑
∑ ∑ ∑
<< ⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟
⎟⎠
⎞⎜⎜⎝
⎛+⎟
⎟⎠
⎞⎜⎜⎝
⎛⋅+
−++−+−=
ji ijijij
ji
ji
bonds angles dihedralsb
rrrqq
nVkxxkV
612
0
20
20
41
)cos(1()()(
σσεεπ
γϕθθθ
Il processo di modeling produrrà contatti ravvicinati fra atomi, e lunghezze di legame sfavorevoli.
⇒ Riuscire ad avere le geometrie giuste
Minimizzazione della energia troppo estensiva, può allontanarci dalla ‘vera’struttura.
SwissModel utilizza GROMOS 96 force field
Interactions and energies
• When protein conformational energy is discussed, peolple talk almost indifferently about forces and energies: the force is the derivative of the energy.
• The Schrodinger equation describes the behavior of a molecule: impossible to solve for complex systems.
• We need a function that approximately describes the energy of interaction that occur in a protein using a simplified representation of both the system and of the energetic contributions of each interaction in the protein: covalent and non-bonded.
Covalent interactions• A covalent bond is formed if the atoms share electrons, but the effect
is not localized and the electron density increase has an effect on the molecule.
• Approximation: treat the bond as a spring between two atoms. The energy is described by use of the Hook’s law.
• The use of this equation is justified by the observation that bondsbetween chemical similar atoms have similar lengths: we assume that the observed equilibrium value is the minimun potential energy.
• The same approximation is used for the energy variation of bondangles.
• Dihedral angles do not have a sinlge energy minimun. In practice it isfound that this potential is not enough to represent the energy of a dihedral angle and often a non-bonded energy interaction termbetween the first and last atoms of the quadruplet is combined
∑∑
∑ ∑ ∑
<< ⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟
⎟⎠
⎞⎜⎜⎝
⎛+⎟
⎟⎠
⎞⎜⎜⎝
⎛⋅+
−++−+−=
ji ijijij
ji
ji
bonds angles dihedralsb
rrrqq
nVkxxkV
612
0
20
20
41
)cos(1()()(
σσεεπ
γϕθθθ
Electrostatic interactions• A nucleus and its electron interact according to Coulom’s law.• We assign a formal charge to all the atoms. Types of interactions: salt
bridges, groups that carry no formal charge can be polarized(electronegative atoms attract electrons while other lose electrons): water molecules, electronegative oxygen attracts the electron and leaves the H atoms with net positive charges. So two waters can forma strong electrostatic interaction: the H-bond. The latter are fundamental in protein structure.
• Partial charges are computed by quantum mechanical calculations on model systems.
• The dielectric constant is a macrospcopic entity derived from the average microscopic effect of polarization.
• If we place to polar charge in a polar medium, the molecules of the medium will tend to line up with the electric field. Their dipole willoppose to the electric field reducing its strenght.
∑∑
∑ ∑ ∑
<< ⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟
⎟⎠
⎞⎜⎜⎝
⎛+⎟
⎟⎠
⎞⎜⎜⎝
⎛⋅+
−++−+−=
ji ijijij
ji
ji
bonds angles dihedralsb
rrrqq
nVkxxkV
612
0
20
20
41
)cos(1()()(
σσεεπ
γϕθθθ
Van der Waals interactions
• Electromagnetic interactions can affect uncharged atoms, they vibrate producing a dipole moment that interacts with the similarly generateddipoles of the sourrounding atoms. This produces an attractinginteraction.
• The other effect is that the orbital of the atoms cannot overlap becauseof the Pauli exclusion principle: two atoms cannot have the samequantum state.
∑∑
∑ ∑ ∑
<< ⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟
⎟⎠
⎞⎜⎜⎝
⎛+⎟
⎟⎠
⎞⎜⎜⎝
⎛⋅+
−++−+−=
ji ijijij
ji
ji
bonds angles dihedralsb
rrrqq
nVkxxkV
612
0
20
20
41
)cos(1()()(
σσεεπ
γϕθθθ
repulsion dispersion
Eelectrostatic . The electrostatic energy is evaluated by using the Restrained ElectrostaticPotential (RESP) partial charges. These charges have the properties of accuratelyreproduce the electrostatic potential multipoles outside the molecule, and they werecalculated in the following way. Ab initio quantum chemical calculations are performedon small molecules and the electrostatic potential { j V } are calculated on M grid pointsoutside the molecule.
II. Modeling by Satisfaction of Spatial restraintsTrovare la struttura più probabile a
partire da un allineamento Utilizza probability density functions. Minimizza deviazioni dai vincoli.
Comparative protein modeling by satisfaction of spatial restraints. A. Šali and T.L. Blundell. J. Mol. Biol. 234, 779-815
Derivate per omologia: Ottenute dal allineamento.
Stereochimiche: Set di parametri di CHARMM parameter - MacKerell et al., 1998 ).
Energie di Van der Waals e Coulomb: dal campo di forza: CHARMM.•‘Esterne’: Vincoli di distanze esterne.
Valutazione del modello ?Da analizzare:
Fold corretto
copertura del modello (%)
Cα - deviazione (rmsd)
Accuratezza dell’allineamento(%)
Catene laterali
Structure Analysis and Verification Server:
http://nihserver.mbi.ucla.edu/SAVS/
Valutazione dell’accuratezza del modello
EVA
Evaluation of Automatic protein structure prediction
[ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva ]
Protein Structure Resources PDB http://www.pdb.orgPDB – Protein Data Bank of experimentally solved structures (RCSB)
CATH http://www.biochem.ucl.ac.uk/bsm/cathHierarchical classification of protein domain structures
SCOP http://scop.mrc-lmb.cam.ac.uk/scopAlexey Murzin’s Structural Classification of proteins
DALI http://www2.ebi.ac.uk/daliLisa Holm and Chris Sander’s protein structure comparison server
SS-Prediction and Fold Recognition PHD http://cubic.bioc.columbia.edu/predictproteinBurkhard Rost’s Secondary Structure and Solvent Accessibility Prediction Server
PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/L.J McGuffin, K Bryson & David T. Jones Secndary struture prediction Server
3DPSSM http://www.sbg.bio.ic.ac.uk/~3dpssFold Recognition Server using 1D and 3D Sequence Profiles coupled.
THREADER: http://bioinf.cs.ucl.ac.uk/threader/threader.htmlDavid T. Jones threading program
Fold recognition Metodi di ProfiliPrinciple: Find a compatible fold
Per ogni aa possiamo calcolare la frequenza relativa:
Presente in struttura secondariaPresente in superficieIn ambiente idrofobico
Allora, ogni aa verrà sostituito da una lettera (propietà)
Da struttura proteica possiamo analizzare le posizioni in termini di:
Elemento di struttura secondaria di appartenenza)
Percentuale della superficie dell’aa che la occupa esposta al solvente.
Si trova in ambiente idrofobico o polare?
Allora, ogni struttura verrà sostituita da una sequenza lineare di ‘propietà’
>Target Sequence XYMSTLYEKLGGTTAVDLAVAAVAGAPAHKRDVLNQ
Rank models according to
SCORE or ENERGY
Build model of target protein based on each
template structure PDB diventa una banca dati di sequenze. Si utilizzano i metodi di ricerca che già conoscete
Fold recognition Calcolo energetico empirico. Es: frequenza in cui ciascuna coppia di aa si trova ad
una certa distanza nei pdb. Se il numero di osservazioni e suficientemente alto: probabilità di trovarla a quella distanza
Threading (‘Infilare’)
M
A
TE
A
F
TS
G
Q
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−−=−∆
AlaAlaPAlaAlaP
kTAlaAlaEunfolded
folded
()(
ln)(
Eq. Boltzmann: la probabilità di osservare qualcosa dipende della sua energia:
P(x)=e -(E(x)/KT)
Possiamo invertire, la energia di un evento che ha una probabilità P(x) sarà:
Frozen approximation: Si calcola l’energia d’interazione del nostro aa con gli aa del templato. Idea: le posizioni finali(per qualunque allineamento) saranno occupate da aa molto simili.
New FoldI new fold, sono veramente ripiegamenti mai visti in natura?
Strutture che hanno motivi strutturali comuni a livello di ‘frammenti’ o di strutture supersecondarie
Assemblaggio di Frammenti
La relazione fra sequenza locale e struttura locale e altamente degenerata.Interazioni locali, dipendenti della sequenza possono ‘deviare’ la struttura locale dei segmenti
Idea:
La distribuzione di conformazioni possibili per un segmento locale di una catena polipetidica può essere approssimata con la distribuzione di strutture adottate dalla sequenza e da sequenze evolutivamente
vicine in proteine di struttura nota
Il mappaggio fra sequenze locali e strutture locali comuni (eliche, terminazioni di eliche, turns) e meno degenerato che per frammenti strutturali generici
Metodo Assemblaggio di frammenti: Dividendo la sequenza in frammenti
MSSPQAPEDGQGCGDRGDPPGDLRSVLVTTV
FRAGFOLDElementi di struttura supersecondaria Frammenti di tri, tetra e pentapeptidi
Ogni frammenti valutato energeticamente(Knowledge-based potential)
Ottimizzazione e Assemblaggio(Knowledge-based potential)
ROSETTAFrammenti di 9 aa
Sceglie le strutture delle 25 sequenze più vicine
ROSETTAConfigurazione stessa
Si sostituiscono gli angoli diedri in modo casuale (Simulated Annealing)
FRAGFOLDCombinazioni casuali di frammenti.Simulated annealing usando i piccoli
frammenti