Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
May
a To
pfIn
terp
reta
tion
of 3
D E
M m
aps,
fit
ting
of a
tom
ic s
truc
ture
s
Imag
e pr
oces
sing
fo
r cry
o-el
ectro
n m
icro
scop
y �
1-11
Sep
tem
ber 2
015
Lect
ure
14
Aims of this lecture
• To understand when 3D EM density fitting is needed.
• To describe the different types of density fitting methods (rigid, flexible, assembly).
• To be aware of different software tools used for visualization and density fitting.
• To be aware of the errors involved in density fitting and understand how to critically assess the resulting models.
���������������� ���������� � ������������������ ���������������������������� �������
��������� ����� ��� ���� ����
Structural biology at different levels of resolution
������� ���� ���� !��� !����
"#��$���$��� ���%�$�&�'#%��� � %$�
��"�� � �( �������� � %$�)�� ������������ � %$�)���*#���� ����������+��
,�������� � %$���������+��"#��$� � ���%�$��
'� ��� �
-������� ������ � %$��-��+�����&��
EMDB Statistics
3D-EM constrained modelling of macromolecular assemblies
Real-space refinementNo
Fitting all known folds
No
3D-EM map
Component Sequence
SegmentationComponent
structure known?
What resolution?
No
Template detected?
Fold assignment
from sequence
De novo chain tracing
Homology modelling
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMA
‘template-free’ modelling
Yes
No
YesÅ
< 20Å< 4.5Å ~4.5-10Å
Yes
SSE assignment
Component structure
3D-EM map
- Academic programs: • Chimera (UCSF) • Vision (Scripps) • VMD (U Illinois Urbana-Champaign) • VolRover (UT Austin) • Gorgon (NCMI & Washington Uni) • Veda (IBS, Grenoble)�
• Coot (Univ of York) • O (Uppsala Univ)�
�- Commercial programs: • PyMOL (Schrodinger) • Amira (TGS, San Diego, CA) • Iris Explorer (NAG, Dowber Grove, IL)
Real-space refinementNo
Fitting all known folds
No
3D-EM map
Component Sequence
SegmentationComponent
structure known?
What resolution?
No
Template detected?
Fold assignment
from sequence
De novo chain tracing
Homology modelling
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMA
‘template-free’ modelling
Yes
No
YesÅ
< 20Å< 4.5Å ~4.5-10Å
Yes
SSE assignment
Component structure
3D-EM map
Segmentation
- Identify boundaries between 3D regions that represent structural components in the context of structural, biochemical and bioinformatic knowledge.
- The identified boundaries can be useful in detecting the positions of known component structures in the map.
- The size of the segmented components depends on the resolution.
20 Å 4.5 Å10 Å
protein secondary structure elements
shapedomains, RNA double-helix
backbone
Segmentation tools
- Manual
Mask Box around marker/atoms Hand erasing
Segmentation tools
- Knowledge-based segmentation:
• Antibody labeling; gold clusters; subunit/domain deletion -> difference mapping (Chimera).
• Recognition of structural components - density fitting.
- Automated: based on density alone (with or without the use of symmetry information)
SeggeR: Pintilie et al, J Struct Biol 2011
Segmentation tools Segmentation methods
- Automated segmentation based on density alone: • Density thresholding: protein and RNA (Spahn et al. 2000).
• Watershed methods (Volkmann 2002), SeggeR (Pintilie et al. 2009, 2011).
• Level set (Baker et al 2006)
• Eigenvalue methods (Frangakis & Hegerl 2002)
• Fast marching method (Frangakis & Hegerl 2002, Bajaj 2003).
• Inference methods to find conserved regions (Saha et al. 2010, Xu et al. 2011)
Mature bacteriophage P22 at 9.5 Å resolution Baker et al. J Struct Biol 2006
Assembly Component
Real-space refinementNo
Fitting all known folds
No
3D-EM map
Component Sequence
SegmentationComponent
structure known?
What resolution?
No
Template detected?
Fold assignment
from sequence
De novo chain tracing
Homology modelling
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMA
‘template-free’ modelling
Yes
No
YesÅ
< 20Å< 4.5Å ~4.5-10Å
Yes
SSE assignment
Component structureFitting all
known foldswn
What resolution?resolution?
De novo chain tracing
No
< 20Å< 4.5Å ~4.5-10ÅSSE SE
assignmentnnm
Fitting of a domain from1.20.1060.10 (mainly alpha) into 1.10.530.10 (mainly-alpha).
SPI-EM: Velazquez-Muriel et al. JMB 2005
< ~15-20 Å: Fit domains from a non-redundant protein domain database (e.g. CATH); • Calculate a Z-score.
12 Å
Detection of bacteriophage Lambda
FREDS: Khayat et al. JSB 2010
Fold recognition from density
BALBES–MOLREP pipeline: Brown et al. Acta Crystallogr D 2015
7 Å
Fold recognition from density
Baker et al. Structure 2007
< ~4.5-10 Å: Secondary structure element detection (SSEHunter)
Programs: Helixhunter, SSEhunter, Ematch, Pathwalker, Coot
�Jiang et al. Nature 2008, 4.5 Å �
~3.5-4.5 Å: De novo Cα tracing
Real-space refinementNo
Fitting all known folds
No
3D-EM map
Component Sequence
SegmentationComponent
structure known?
What resolution?
Template detected?
Fold assignment
from sequence
De novo chain tracing
Homology modelling
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMA
‘template-free’ modelling
Yes
No
YesÅ
< 20Å< 4.5Å ~4.5-10Å
Yes
SSE assignment
Component structure
NoNo
Template detected?
Fold assignment
from sequenceHomology
gmodellingg‘template-free’
gmodellingg
sYes
Anabaena 7120
Anacystis nidulans
Condrus crispus
Desulfovibrio vulgaris
Evolution (rules)
Threading Homology Modelling
Evolutionary couplings
GFCHIKAYTRLIMVG�
Folding (physics)
Ab initio (de novo) prediction
Zhang, Curr Opin Struct Biol 2008; Marks et al. Nat Biotechnol. 2012
Fold recognition from sequence
Template-free modellingReal-space refinementNo
Fitting all known folds
No
3D-EM map
Component Sequence
SegmentationComponent
structure known?
What resolution?
Template detected?
Fold assignment
from sequence
De novo chain tracing
Homology modelling
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMA
‘template-free’ modelling
Yes
No
YesÅ
< 20Å< 4.5Å ~4.5-10Å
Yes
SSE assignment
Component structure
Component structure known?
Yes
N
Component pstructure
NoNo
Template detected?
Fold assignment
from sequenceHomology
gmodellingg‘template-free’
gmodellingg
sYes
Real-space refinementNo
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMAYes
De novo
Flexible Fitting
Rigid-Body Fitting / Assembly Fitting
Conformational Changes / Shape
Domain boundaries
αα------helix
ββ------sheets
Side chains
25 Å 10 Å 4.5 Å
Density fitting
Villa & Lasker, Curr Opin Struct Biol, 2014.
Fitting an atomic structure within the envelope (an isocontour) of the density using visualisation programs.�
Pros:�-No current computers can beat the human brain in certain pattern recognition tasks.�-Immediate feedback and intelligent choices by the user.�-Often good for the initial placement of the component in the map.�
Cons:�-High level of subjectivity may lead to error, especially if the map does not have sufficient distinctive features for an unambiguous placement of the component.�-Depends on contour level.�-Conformational rearrangements cannot be modelled (misfits and steric clashes).
Manual fitting
Automated fitting
All automated fitting methods require:
1. a way of representing both the structure and the density map (representation).
2. a way of measuring the goodness-of-fit (scoring).
3. a method of finding the best fit (an optimisation algorithm).
Optimisation based on goodness-of-fit
Density mapComponent atomic structure
Component representation and placement
Villa & Lasker, Curr Opin Struct Biol, 2014.
�������������� ��Cross Correlation Coefficient (CCC)
CCC
������������������������ ρcalc
Blur atomic structure (m)
ρobs
������������������������
Compare with Experimental map
rigid fittingX-ray structure
Representation and scoring
Density-based scoring functions
Wriggers & Chacon, Structure 2001; Vasishtan & Topf, J Struct Biol 2011, Farabella et al. J Appl Cryst. 2015
• Cross-correlation coefficient (CCC)
• Filters: Laplacian-filtered CCC (LAP)
• Local correlation (SCCC)
• Mutual information-based score (MI)
i �
�� ii �
p(x), p(y)�
I(X;Y) = p(x, y)logp(x, y)
p(x)p(y)y∈Y
∑x∈X
∑ iii �
p(x,y)�(
Local scoring
Roseman, Acta Crystallogr D 2000; Pandurangan et al., J Struct Biol 2014
SSCCC =
• Segment-based cross-correlation coefficient (SCCC)
Target density Y
Probe density X
���
Normal Vector score (NV)
���������� ���
Surface-based scoring functions
Ceulemans & Russell, J Mol Biol 2004; Vasishtan & Topf, J Struct Biol 2011
• Normal vector score (NV)
• Chamfer distance (CD)
TEMPy
Optimisation: rigid fitting
Rotate and translate the component to search through all possible configurations in the density map so as to maximise the fit between the component and the map.
6D search
.............. .............. .............. .............. .............. .............. .............. .............. .............. .............. .............. �
.............. .............. .............. .............. .............. .............. .............. .............. .............. .............. .............. �
Exhaustive search
- Local fitting - Search exhaustively a given sub-region in the map.
.............................
........................................................................................................................................................................................................................................................... .........
..........
.............................................................................................
Pros: Get the global solution in respect to a given scoring function. Cons: The search in real space is too large for most scores (very expensive). ��
- Acceleration: FFT (translational moves); Spherical harmonics (rotational moves).
j
�
Pros: Fast; easy to implement different scoring functions. Cons: The model can be “trapped” in local minima.
6D rotational & translational search
Stochastic and gradient methods
Sco
re
Parameters
Optimisation follows steepest gradient
Gradient methods
Optimisation follows gradient method, with random ‘jumps’ to avoid local minima
Monte Carlo methods
Sco
re
Parameters
Problems with density fitting
��
i. Limitations of resolution
2 Å�����10 Å�����20 Å
Correct fit Flipped 180
Solutions: - Improve scoring of goodness-of-fit.
�- Coarse-graining (change representation) �- Fit/model assessment. �
Problems: - At low resolution: many local optima with
similar numerical values. �
- Local resolution, noise, scaling, filtering, masking.
�- Blurring of the atomic structure.
Protein structure prediction
NMR spectroscopy
X-ray crystallography
20 Å
�
ii. Multi-component fitting
Problem: Components may migrate toward the centre of the map.
Sequential fitting
����������������� ���������������������
correct fit
Solution: - Simultaneous fitting (assembly/multiple fitting) �- Multiple scores (additional constraints).
Assembly fitting
Birmanns & Wriggers. J Struct Biol 2007; Lasker et al, JMB 2009; Zhang et al. Bioinformatics 2010 Programs: Chimera/MultiFit, γ-TEMPy
• Divide the density into groups having approximately the same number of points closest to them (Vector Quantization or K-means clustering). Each group is represented by its centroid point.
• Match between centroids and components.
• Score of components simultaneously using CCF and additional scores/restraints (eg. geometric complementarity, clash scores). �
•Score of components simultaneously using CCC and spatial restraints from additional experiments
Integrative modelling
Programs: IMP, Rosetta, HADDOCK
26S proteasome
Forster et al. Biochem Biophys Res Commun, 2009.
Integrative modelling
The anaphase-promoting complex (APC/C)
Schreiber et al. Nature, 2011.
iii. Conformational variability
Solution: change the conformation of the atomic model during the fitting process — flexible fitting.
Problem: Conformations observed by 3D EM often deviate from the conformations of the atomic models we fit. �- Dynamics. - Crystal packing effects. - Errors in structure prediction.
Real-space refinementNo
Fitting all known folds
No
3D-EM map
Component Sequence
SegmentationComponent
structure known?
What resolution?
No
Template detected?
Fold assignment
from sequence
De novo chain tracing
Homology modelling
Rigid fitting
fit different from map?
Multiple conformations
ENM / NMA
‘template-free’ modelling
Yes
No
YesÅ
< 20Å< 4.5Å ~4.5-10Å
Yes
SSE assignment
Component structure
Real-space refinement
g Multiple conformations
ENM / NMAYes
- Identify one of the most accurate models from a decoy set based the quality of fit
Topf et al, J Struct Biol 2005
Fitting multiple conformations
Baker et al. PloS Comput Bio 2006
Programs: MODELLER, Rosetta
- Structure library of ~4600 fits - Score by SCCC
Fitting multiple conformations
Lukoyanova et al. PLoS Biol. 2015
Pleurotolysin (PlyA-B) Pore-forming protein
NMA-based refinement�
- Elastic Network Model (ENM)
- Normal Mode Analysis (NMA): A collection of harmonic oscillators; those with low frequency and large amplitude motions often correlate with experimentally observed conformational changes.
- Cons: A ligand can stretch the protein in ways that involve higher frequency modes that are not taken into account; subjective selection of modes; All density must be accounted for.
Tama et al, 2004 Programs: NMFF, iMODFIT, NORMA�
- Geometry-based conformational sampling (Direx)
- The fit of the probe structure is optimised simultaneously with the stereo-chemical properties by the minimisation of a scoring function, such as:
- Optimisation is performed on “rigid bodies” by energy minimisation and molecular dynamics.�
Chen & Chapman, JSB 2003;
Topf et al., Structure, 2008;
Trabuco et al. Structure 2008;�
E = w1 ∗ECC (P) + w2 ∗ESC (P) +w3 ∗ENB (P)
Real-space refinement
Pros: Flexible (finer fragmentation); Different optimisation methods can be applied; easy to add more restraints.�
Cons: Only local search; slow; danger of over-fitting, subjective rigid bodies/constraints.
Programs: MDFF, Flex-EM, Coot�
Refinement at intermediate resolution
1VCB, 10 Å resolution
Cα RMSD from native: 7.5 Å Cα RMSD from native: 2.1 Å
Before refinement After refinement
native best predicted fit
Before refinement After refinement - clustered After refinement - non-clustered
Overfitting
���� �!�"
��������������
Flexible fitting of an actin subunit�at 15 Å resolution
Coarse graining
Pandurangan & Topf, J Struct Biol 2012
Initial Un-clustered Clustered
Cα RMSD:
http://ribfind.ismb.lon.ac.uk/
1dpe, 5 Å
Initial Final un-clustered Final clustered
RMSD from target: �� ���. - /��.- �/�. #$�%#����� ��� � ��0� �/�CCC:
Final two-stage refinement
'���1����� ���.
���1��������� /���
Hierarchical refinement
Bottom ring could be fitted using rigid fitting alone (PDB: 1oel).
Top ring needed refinement using hierarchical flexible fitting
Hierarchical refinement
TRs1 conformation
Clare et al., Cell 2012
Capturing the machine motions GroEL-ATP7Apo
Clare et al., Cell 2012 &�
Fit / Model Validation
EM Validation Task Force: “We recommend coordinated development of model assessment criteria and corresponding software, with special emphasis on criteria reflecting the suitability of models for specific end-user applications.”
Henderson et al. Structure 2012.
Why?
3128 maps in EMDB. ~687 fits in PDB.
Approaches to model validation
– Geometry: deviation from ideal bonds and angles, planes, Ramachandran plots, atom clashes
– Cross-validation of overfittng (Dimaio et al. 2013, Falkner & Schröder 2013)
– Consensus methods (Ahmed & Tama 2013, Pandurangan et al. 2014)
– Multiple scoring of ensemble models (Farabella et al 2015)
– Partial scoring (Farabella et al 2015)
TEMPy: http://tempy.ismb.lon.ac.uk/
�#%���%���������#��������#�����$����'��'����($����'�
Partial (local) scoring
Segment-based cross-correlation coefficient (SCCC)
Single fit assessment: local scoring
NN cryo-EM density for Kinesin-3 motor domain
Heat map showing the quality of the local fit for specific elements of the motor domain in different nucleotide states
Atherton et al. eLife 2014;3:e03680
6.3 Å resolution EMD-2765 PDB-4uxo
Local assessment
�#%���%���������#��������#�����$����'��'����($����'�
Multiple scores and fits
ADP-EM (Garzón et al. Bioinformatics 2007)
Ensemble of fits: Global assessment
)���*� ����'�������
20 Å resolutions, simulated map (PDB: 1tyq)
RMSD clustering of the top 90 fits resulted in all top-10 scoring fits in one cluster
)���*� ����'������� ��$������+��%#�������
20 Å resolutions, simulated map (PDB: 1tyq)
Ensemble of fits: Global assessment
Hierarchically Cα-RMSD clustering of the top 20 fits
Cross Correlation
Mutual Information Normal Vectors
GroEL [PDB: 1oel]
GroEL [EMD: 1080] (11.5 Å)
�low high
Re-ranking based on consensus scoring
Ensemble of fits: Local assessment
�
��
��
��
�
��� ��������
Ensemble of fits: Local assessment
,� -��..��!/0.������%#��"�/��������#�#��������1 2
� ����������
��
� ��
��
Top 20 fits
EMD-2795 PDB-4v3m
Ensemble of fits
Lukoyanova et al. PLoS Biol. 2015
Pleurotolysin (PlyA-B) Pore-forming protein
http://challenges.emdatabank.org/?q=2015_model_challenge
2015 EMDB Model Challenge
• Establish a benchmark set of 3DEM maps in the 3.0-4.5 Å resolution range, where significant growth in the number of maps is anticipated over the next few years and where a number of technical challenges exist to map interpretation and fitting �
• Encourage developers of modelling software packages and biological end users to analyze these maps and present modelling results with the best practice �
• Evolve criteria for evaluation and validation of 3DEM map-derived models �
• Compare and contrast the various modelling and analysis approaches (in a positive spirit!)
Thank you!